[jira] [Created] (KAFKA-2014) Chaos Monkey / Failure Inducer for Kafka

2015-03-11 Thread Mayuresh Gharat (JIRA)
Mayuresh Gharat created KAFKA-2014:
--

 Summary: Chaos Monkey / Failure Inducer for Kafka
 Key: KAFKA-2014
 URL: https://issues.apache.org/jira/browse/KAFKA-2014
 Project: Kafka
  Issue Type: Task
Reporter: Mayuresh Gharat
Assignee: Mayuresh Gharat


Implement a Chaos Monkey for kafka, that will help us catch any shortcomings in 
the test environment before going to production. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357167#comment-14357167
 ] 

Jiangjie Qin commented on KAFKA-1716:
-

[~ashwin.jayaprakash] It should work when you call from other thread. From the 
thread stack trace it looks the consumer fetcher thread blocked in 
processPartitionData(), Can you provide a full thread dump so we can 
investigate why this happen?

 hang during shutdown of ZookeeperConsumerConnector
 --

 Key: KAFKA-1716
 URL: https://issues.apache.org/jira/browse/KAFKA-1716
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Sean Fay
Assignee: Neha Narkhede
 Attachments: kafka-shutdown-stuck.log


 It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to 
 wedge in the case that some consumer fetcher threads receive messages during 
 the shutdown process.
 Shutdown thread:
 {code}-- Parking to wait for: 
 java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
 at jrockit/vm/Locks.park0(J)V(Native Method)
 at jrockit/vm/Locks.park(Locks.java:2230)
 at sun/misc/Unsafe.park(ZJ)V(Native Method)
 at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
 at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
 at 
 kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
 at 
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
 at 
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
 at 
 scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
 at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
 at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
 at 
 scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
 ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
 at 
 kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
 at 
 kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
 at 
 kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code}
 ConsumerFetcherThread:
 {code}-- Parking to wait for: 
 java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568
 at jrockit/vm/Locks.park0(J)V(Native Method)
 at jrockit/vm/Locks.park(Locks.java:2230)
 at sun/misc/Unsafe.park(ZJ)V(Native Method)
 at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
 at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
 at 
 kafka/consumer/ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
 at 
 kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:130)
 at 
 kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:111)
 at scala/collection/immutable/HashMap$HashMap1.foreach(HashMap.scala:224)
 at 
 scala/collection/immutable/HashMap$HashTrieMap.foreach(HashMap.scala:403)
 at 
 kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:111)
 at 
 kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111)
 at 
 kafka/server/AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:111)
 at kafka/utils/Utils$.inLock(Utils.scala:538)
 at 
 

[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357509#comment-14357509
 ] 

Jiangjie Qin commented on KAFKA-2011:
-

Yes, 6000 ms sounds too short. Can you try bumping it up to 60?
200MB/s sounds OK for 5 brokers.

 Rebalance with auto.leader.rebalance.enable=false 
 --

 Key: KAFKA-2011
 URL: https://issues.apache.org/jira/browse/KAFKA-2011
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.0
 Environment: 5 Hosts of below config:
 x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 
 1600.000MHz RAM 189 GB GNU/Linux
Reporter: K Zakee
Priority: Blocker
 Attachments: controller-logs-1.zip, controller-logs-2.zip


 Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
 below:
 auto.leader.rebalance.enable=false
 controlled.shutdown.enable=true
 controlled.shutdown.max.retries=1
 controlled.shutdown.retry.backoff.ms=5000
 default.replication.factor=3
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824
 message.max.bytes=100
 num.io.threads=14
 num.network.threads=14
 num.partitions=10
 queued.max.requests=500
 num.replica.fetchers=4
 replica.fetch.max.bytes=1048576
 replica.fetch.min.bytes=51200
 replica.lag.max.messages=5000
 replica.lag.time.max.ms=3
 replica.fetch.wait.max.ms=1000
 fetch.purgatory.purge.interval.requests=5000
 producer.purgatory.purge.interval.requests=5000
 delete.topic.enable=true
 Logs show rebalance happening well up to 24 hours after the start.
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
 preferred replica election:  (kafka.controller.KafkaController)
 …
 [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
 leader election for partitions  (kafka.controller.KafkaController)
 ...
 [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
 preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76121
---


LGTM overall except one potential issue on consumer metrics colliding.


core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/31706/#comment123549

Should we add the index as the suffix to the consumer id in the 
consumerConfig to distinguish different connector instances? Otherwise the 
consumer metrics would be collided.



core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/31706/#comment123551

Add a comment like Creating just on stream per each connector instance



core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/31706/#comment123553

Leave a TODO comment that after KAFKA-1660 this will be changed accordingly.



core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/31706/#comment123554

Why not just import java.util.List?


- Guozhang Wang


On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31706/
 ---
 
 (Updated March 11, 2015, 1:31 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1997
 https://issues.apache.org/jira/browse/KAFKA-1997
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Guozhang's comments.
 
 
 Changed the exit behavior on send failure because close(0) is not ready yet. 
 Will submit followup patch after KAFKA-1660 is checked in.
 
 
 Expanded imports from _ and * to full class path
 
 
 Incorporated Joel's comments.
 
 
 Addressed Joel's comments.
 
 
 Addressed Guozhang's comments.
 
 
 Diffs
 -
 
   
 clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
 e6ff7683a0df4a7d221e949767e57c34703d5aad 
   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
 5487259751ebe19f137948249aa1fd2637d2deb4 
   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 bafa379ff57bc46458ea8409406f5046dc9c973e 
   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
   
 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
 
 Diff: https://reviews.apache.org/r/31706/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357640#comment-14357640
 ] 

Guozhang Wang commented on KAFKA-1501:
--

Thanks for the patch [~ewencp]. I have started to look at your patch but want 
to get some clarifications: 

TestUtils.choosePorts should return a random port number each time it gets 
called, but then since the socket is closed, the next time it gets called, the 
same port number could be returned before it gets used, hence causing the 
conflict?

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie
 Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, 
 KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, 
 KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, 
 test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, 
 test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, 
 test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, 
 test-84.out, test-87.out, test-91.out, test-92.out


 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357660#comment-14357660
 ] 

Sriharsha Chintalapani commented on KAFKA-1684:
---

Created reviewboard https://reviews.apache.org/r/31958/diff/
 against branch origin/trunk

 Implement TLS/SSL authentication
 

 Key: KAFKA-1684
 URL: https://issues.apache.org/jira/browse/KAFKA-1684
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
Assignee: Ivan Lyutov
 Attachments: KAFKA-1684.patch, KAFKA-1684.patch


 Add an SSL port to the configuration and advertise this as part of the 
 metadata request.
 If the SSL port is configured the socket server will need to add a second 
 Acceptor thread to listen on it. Connections accepted on this port will need 
 to go through the SSL handshake prior to being registered with a Processor 
 for request processing.
 SSL requests and responses may need to be wrapped or unwrapped using the 
 SSLEngine that was initialized by the acceptor. This wrapping and unwrapping 
 is very similar to what will need to be done for SASL-based authentication 
 schemes. We should have a uniform interface that covers both of these and we 
 will need to store the instance in the session with the request. The socket 
 server will have to use this object when reading and writing requests. We 
 will need to take care with the FetchRequests as the current 
 FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we 
 can only use this optimization for unencrypted sockets that don't require 
 userspace translation (wrapping).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357529#comment-14357529
 ] 

Guozhang Wang commented on KAFKA-1910:
--

[~junrao] Could you take a look at the patch so that I can check-in the fix if 
it LGTY?

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Blake Smith


 On March 11, 2015, 8:39 p.m., Jiangjie Qin wrote:
  Works for me but still see the following line:
  there were 12 feature warning(s); re-run with -feature for details
  I tried ./gradlew jar -feature, but it seems does not work at all. If this 
  is the related issue, can we solve it in this patch? Otherwise we can 
  create another ticket to address it.

Thanks for looking at this Jiangjie,

In order for feature warnings to display, you have to add 
scalaCompileOptions.additionalParameters = [-feature] below line 131 in 
build.gradle at the root of the project. I posted the verbose output from 
enabling the flag on the ticket: 
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14356280page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14356280.

Bottom line is: I don't think we can fix these feature warnings until Kafka 
stops supporting scala 2.9. I get these build errors when trying to build a 2.9 
jar with the language imports brought in:



/Users/blake/src/kafka/core/src/main/scala/kafka/javaapi/Implicits.scala:19: 
object language is not a member of package scala
import scala.language.implicitConversions

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or 
--debug option to get more log output.

BUILD FAILED

Does that make sense?


- Blake


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76122
---


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31925/
 ---
 
 (Updated March 11, 2015, 4:35 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1054
 https://issues.apache.org/jira/browse/KAFKA-1054
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
 up to date with the latest trunk and fixed 2 more lingering compiler warnings.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
 
 Diff: https://reviews.apache.org/r/31925/diff/
 
 
 Testing
 ---
 
 Ran full test suite.
 
 
 Thanks,
 
 Blake Smith
 




Re: Can I be added as a contributor?

2015-03-11 Thread Joe Stein
Grant, I added your perms for Confluence.

Grayson, I couldn't find a confluence account for you so couldn't give you
perms.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke ghe...@cloudera.com wrote:

 Thanks Joe. I added a Confluence account.

 On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein joe.st...@stealth.ly wrote:

  Grant, I added you.
 
  Grayson and Grant, you should both also please setup Confluence accounts
  and we can grant perms to Confluence also too for your username.
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke ghe...@cloudera.com
 wrote:
 
   I am also starting to work with the Kafka codebase with plans to
  contribute
   more significantly in the near future. Could I also be added to the
   contributor list so that I can assign myself tickets?
  
   Thank you,
   Grant
  
   On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang wangg...@gmail.com
  wrote:
  
Added grayson.c...@gmail.com to the list.
   
On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao
  gc...@linkedin.com.invalid
   
wrote:
   
 Hello Kafka devs,

 I'm working on the ops side of Kafka at LinkedIn (embedded SRE on
 the
 Kafka team) and would like to start familiarizing myself with the
codebase
 with a view to eventually making substantial contributions. Could
 you
 please add me as a contributor to the Kafka JIRA so that I can
 assign
 myself a newbie ticket?

 Thanks!
 Grayson
 --
 Grayson Chao
 Data Infra Streaming SRE

   
   
   
--
-- Guozhang
   
  
  
  
   --
   Grant Henke
   Solutions Consultant | Cloudera
   ghe...@cloudera.com | 920-980-8979
   twitter.com/ghenke http://twitter.com/gchenke |
   linkedin.com/in/granthenke
  
 



 --
 Grant Henke
 Solutions Consultant | Cloudera
 ghe...@cloudera.com | 920-980-8979
 twitter.com/ghenke http://twitter.com/gchenke |
 linkedin.com/in/granthenke



Re: Can I be added as a contributor?

2015-03-11 Thread Brock Noland
Hi,

Sorry to pile on :) but could I be added as a contributor and to
confluence as well? I am brocknoland on JIRA and brockn at gmail on
confluence.

Cheers!
Brock

On Wed, Mar 11, 2015 at 1:44 PM, Joe Stein joe.st...@stealth.ly wrote:
 Grant, I added your perms for Confluence.

 Grayson, I couldn't find a confluence account for you so couldn't give you
 perms.

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke ghe...@cloudera.com wrote:

 Thanks Joe. I added a Confluence account.

 On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein joe.st...@stealth.ly wrote:

  Grant, I added you.
 
  Grayson and Grant, you should both also please setup Confluence accounts
  and we can grant perms to Confluence also too for your username.
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke ghe...@cloudera.com
 wrote:
 
   I am also starting to work with the Kafka codebase with plans to
  contribute
   more significantly in the near future. Could I also be added to the
   contributor list so that I can assign myself tickets?
  
   Thank you,
   Grant
  
   On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang wangg...@gmail.com
  wrote:
  
Added grayson.c...@gmail.com to the list.
   
On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao
  gc...@linkedin.com.invalid
   
wrote:
   
 Hello Kafka devs,

 I'm working on the ops side of Kafka at LinkedIn (embedded SRE on
 the
 Kafka team) and would like to start familiarizing myself with the
codebase
 with a view to eventually making substantial contributions. Could
 you
 please add me as a contributor to the Kafka JIRA so that I can
 assign
 myself a newbie ticket?

 Thanks!
 Grayson
 --
 Grayson Chao
 Data Infra Streaming SRE

   
   
   
--
-- Guozhang
   
  
  
  
   --
   Grant Henke
   Solutions Consultant | Cloudera
   ghe...@cloudera.com | 920-980-8979
   twitter.com/ghenke http://twitter.com/gchenke |
   linkedin.com/in/granthenke
  
 



 --
 Grant Henke
 Solutions Consultant | Cloudera
 ghe...@cloudera.com | 920-980-8979
 twitter.com/ghenke http://twitter.com/gchenke |
 linkedin.com/in/granthenke



Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/#review76130
---


Yes, we need to set the backoff for the consumerFetcherThread as well. We can 
just use refreshLeaderBackoffMs.


core/src/main/scala/kafka/server/AbstractFetcherThread.scala
https://reviews.apache.org/r/31927/#comment123577

Instead of adding the new backOffWaitLatch, we can probably just wait on 
the existing partitionMapCond. This way, if there is a change in partitions, 
the thread can wake up immediately.

We probably can improve shutdown() a bit to wake up the thread waiting on 
the partitionMapCond. To do that, we can do the following steps in shutdown().

initiateShutdown()
partitionMapCond.signalAll()
awaitShutdown()
simpleConsumer.close()


- Jun Rao


On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31927/
 ---
 
 (Updated March 11, 2015, 5:41 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1461
 https://issues.apache.org/jira/browse/KAFKA-1461
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 48e33626695ad8a28b0018362ac225f11df94973 
   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
 d6d14fbd167fb8b085729cda5158898b1a3ee314 
   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
 c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 
 
 Diff: https://reviews.apache.org/r/31927/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sriharsha Chintalapani
 




Review Request 31958: Patch for KAFKA-1684

2015-03-11 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31958/
---

Review request for kafka.


Bugs: KAFKA-1684
https://issues.apache.org/jira/browse/KAFKA-1684


Repository: kafka


Description
---

KAFKA-1684. Implement TLS/SSL authentication.


Diffs
-

  core/src/main/scala/kafka/network/Channel.scala PRE-CREATION 
  core/src/main/scala/kafka/network/SocketServer.scala 
76ce41aed6e04ac5ba88395c4d5008aca17f9a73 
  core/src/main/scala/kafka/network/ssl/SSLChannel.scala PRE-CREATION 
  core/src/main/scala/kafka/network/ssl/SSLConnectionConfig.scala PRE-CREATION 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/KafkaServer.scala 
dddef938fabae157ed8644536eb1a2f329fb42b7 
  core/src/main/scala/kafka/utils/SSLAuthUtils.scala PRE-CREATION 
  core/src/test/scala/unit/kafka/network/SocketServerTest.scala 
0af23abf146d99e3d6cf31e5d6b95a9e63318ddb 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/utils/TestSSLUtils.scala PRE-CREATION 

Diff: https://reviews.apache.org/r/31958/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin


 On March 11, 2015, 8:39 p.m., Guozhang Wang wrote:
  core/src/main/scala/kafka/tools/MirrorMaker.scala, line 195
  https://reviews.apache.org/r/31706/diff/6-7/?file=890015#file890015line195
 
  Should we add the index as the suffix to the consumer id in the 
  consumerConfig to distinguish different connector instances? Otherwise the 
  consumer metrics would be collided.

Good point!


 On March 11, 2015, 8:39 p.m., Guozhang Wang wrote:
  core/src/main/scala/kafka/tools/MirrorMaker.scala, lines 477-481
  https://reviews.apache.org/r/31706/diff/6-7/?file=890015#file890015line477
 
  Why not just import java.util.List?

Because List is a native type for scala, even after we imported java.util.List, 
we still need util.List to avoid collision.


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76121
---


On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31706/
 ---
 
 (Updated March 11, 2015, 1:31 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1997
 https://issues.apache.org/jira/browse/KAFKA-1997
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Guozhang's comments.
 
 
 Changed the exit behavior on send failure because close(0) is not ready yet. 
 Will submit followup patch after KAFKA-1660 is checked in.
 
 
 Expanded imports from _ and * to full class path
 
 
 Incorporated Joel's comments.
 
 
 Addressed Joel's comments.
 
 
 Addressed Guozhang's comments.
 
 
 Diffs
 -
 
   
 clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
 e6ff7683a0df4a7d221e949767e57c34703d5aad 
   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
 5487259751ebe19f137948249aa1fd2637d2deb4 
   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 bafa379ff57bc46458ea8409406f5046dc9c973e 
   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
   
 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
 
 Diff: https://reviews.apache.org/r/31706/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76122
---


Works for me but still see the following line:
there were 12 feature warning(s); re-run with -feature for details
I tried ./gradlew jar -feature, but it seems does not work at all. If this is 
the related issue, can we solve it in this patch? Otherwise we can create 
another ticket to address it.

- Jiangjie Qin


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31925/
 ---
 
 (Updated March 11, 2015, 4:35 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1054
 https://issues.apache.org/jira/browse/KAFKA-1054
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
 up to date with the latest trunk and fixed 2 more lingering compiler warnings.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
 
 Diff: https://reviews.apache.org/r/31925/diff/
 
 
 Testing
 ---
 
 Ran full test suite.
 
 
 Thanks,
 
 Blake Smith
 




[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2015-03-11 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357673#comment-14357673
 ] 

Ewen Cheslack-Postava commented on KAFKA-1501:
--

[~guozhang] Yes, that's correct, so the only way to completely avoid the 
problem is to allow the kernel to assign the port automatically. I haven't 
checked, but I also wouldn't be surprised if the kernel actually saves recently 
freed ports to use for a fast path, which could explain why this happens more 
frequently than you might think it would given the fairly large range used for 
ephemeral ports.

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie
 Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, 
 KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, 
 KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, 
 test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, 
 test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, 
 test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, 
 test-84.out, test-87.out, test-91.out, test-92.out


 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1684:
--
Attachment: KAFKA-1684.patch

 Implement TLS/SSL authentication
 

 Key: KAFKA-1684
 URL: https://issues.apache.org/jira/browse/KAFKA-1684
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
Assignee: Ivan Lyutov
 Attachments: KAFKA-1684.patch, KAFKA-1684.patch


 Add an SSL port to the configuration and advertise this as part of the 
 metadata request.
 If the SSL port is configured the socket server will need to add a second 
 Acceptor thread to listen on it. Connections accepted on this port will need 
 to go through the SSL handshake prior to being registered with a Processor 
 for request processing.
 SSL requests and responses may need to be wrapped or unwrapped using the 
 SSLEngine that was initialized by the acceptor. This wrapping and unwrapping 
 is very similar to what will need to be done for SASL-based authentication 
 schemes. We should have a uniform interface that covers both of these and we 
 will need to store the instance in the session with the request. The socket 
 server will have to use this object when reading and writing requests. We 
 will need to take care with the FetchRequests as the current 
 FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we 
 can only use this optimization for unencrypted sockets that don't require 
 userspace translation (wrapping).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357668#comment-14357668
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha] and [~guozhang], thinking about this a bit more. There are really 
two types of states that we need to manage in AbstractFetcherThread. The first 
one is the connection state, i.e., if a connection breaks, we want to backoff 
the reconnection. The second one is the partition state, i.e., if the partition 
hits an exception, we want to backoff that particular partition a bit.

The first one is what [~sriharsha]'s current RB is addressing. How about let's 
complete this first since it affects the performance of the unit tests? Once 
that's committed, we can address the second one, which is in [~sriharsha]'s 
initial patch.


 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2006:
-
Priority: Major  (was: Blocker)

 switch the broker server over to the new java protocol definitions
 --

 Key: KAFKA-2006
 URL: https://issues.apache.org/jira/browse/KAFKA-2006
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread K Zakee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357643#comment-14357643
 ] 

K Zakee commented on KAFKA-2011:


Did you mean 600 secs (10 mins)?

 Rebalance with auto.leader.rebalance.enable=false 
 --

 Key: KAFKA-2011
 URL: https://issues.apache.org/jira/browse/KAFKA-2011
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.0
 Environment: 5 Hosts of below config:
 x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 
 1600.000MHz RAM 189 GB GNU/Linux
Reporter: K Zakee
Priority: Blocker
 Attachments: controller-logs-1.zip, controller-logs-2.zip


 Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
 below:
 auto.leader.rebalance.enable=false
 controlled.shutdown.enable=true
 controlled.shutdown.max.retries=1
 controlled.shutdown.retry.backoff.ms=5000
 default.replication.factor=3
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824
 message.max.bytes=100
 num.io.threads=14
 num.network.threads=14
 num.partitions=10
 queued.max.requests=500
 num.replica.fetchers=4
 replica.fetch.max.bytes=1048576
 replica.fetch.min.bytes=51200
 replica.lag.max.messages=5000
 replica.lag.time.max.ms=3
 replica.fetch.wait.max.ms=1000
 fetch.purgatory.purge.interval.requests=5000
 producer.purgatory.purge.interval.requests=5000
 delete.topic.enable=true
 Logs show rebalance happening well up to 24 hours after the start.
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
 preferred replica election:  (kafka.controller.KafkaController)
 …
 [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
 leader election for partitions  (kafka.controller.KafkaController)
 ...
 [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
 preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2003) Add upgrade tests

2015-03-11 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357325#comment-14357325
 ] 

Abhishek Nigam commented on KAFKA-2003:
---

Hi Gwen,
I did not realize might patch got uploaded to RB but the link was not attached 
to the jira.
I just added it in the comments section of 1888. 
https://reviews.apache.org/r/30809/



 Add upgrade tests
 -

 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 To test protocol changes, compatibility and upgrade process, we need a good 
 way to test different versions of the product together and to test end-to-end 
 upgrade process.
 For example, for 0.8.2 to 0.8.3 test we want to check:
 * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
 * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
 * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
 There are probably more questions. But an automated framework that can test 
 those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357263#comment-14357263
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Sriharsha Chintalapani


 On March 11, 2015, 5:06 p.m., Jun Rao wrote:
  core/src/main/scala/kafka/server/KafkaConfig.scala, line 555
  https://reviews.apache.org/r/31927/diff/1/?file=891131#file891131line555
 
  The rest of the timeouts are of Int. It may be useful to move everthing 
  to long in the future, but that can be done in a separate jira.

Thanks for the review. Please check the latest patch. Do we want to add another 
config for ConsumerFetcherThread with this patch it doesn't pass any timeout 
value to AbstractFetcherThread.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/#review76075
---


On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31927/
 ---
 
 (Updated March 11, 2015, 5:41 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1461
 https://issues.apache.org/jira/browse/KAFKA-1461
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 48e33626695ad8a28b0018362ac225f11df94973 
   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
 d6d14fbd167fb8b085729cda5158898b1a3ee314 
   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
 c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 
 
 Diff: https://reviews.apache.org/r/31927/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sriharsha Chintalapani
 




[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2006:
-
Assignee: (was: Andrii Biletskyi)

 switch the broker server over to the new java protocol definitions
 --

 Key: KAFKA-2006
 URL: https://issues.apache.org/jira/browse/KAFKA-2006
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread K Zakee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357426#comment-14357426
 ] 

K Zakee commented on KAFKA-2011:


1) The zk timeout I see occurring when the controller migration happened. 
Current ZK timeout setting on brokers is default value of 6000ms. 
Is increasing the timeout recommended?

2) As for the data loads, we did have the high producer loads unto 200MB/s for 
a stretch of hours, and reducing gradually to 150 to 130. But given the 1GBps 
NIC on each of five brokers and 3 zookeepers, do you think this data loads 
would be a problem? 

3) Do you think changing any configuration setting provided above may help ?

 Rebalance with auto.leader.rebalance.enable=false 
 --

 Key: KAFKA-2011
 URL: https://issues.apache.org/jira/browse/KAFKA-2011
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.0
 Environment: 5 Hosts of below config:
 x86_64 32-bit, 64-bit Little Endian 24 GenuineIntel CPUs Model 44 
 1600.000MHz RAM 189 GB GNU/Linux
Reporter: K Zakee
Priority: Blocker
 Attachments: controller-logs-1.zip, controller-logs-2.zip


 Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
 below:
 auto.leader.rebalance.enable=false
 controlled.shutdown.enable=true
 controlled.shutdown.max.retries=1
 controlled.shutdown.retry.backoff.ms=5000
 default.replication.factor=3
 log.cleaner.enable=true
 log.cleaner.threads=5
 log.cleanup.policy=delete
 log.flush.scheduler.interval.ms=3000
 log.retention.minutes=1440
 log.segment.bytes=1073741824
 message.max.bytes=100
 num.io.threads=14
 num.network.threads=14
 num.partitions=10
 queued.max.requests=500
 num.replica.fetchers=4
 replica.fetch.max.bytes=1048576
 replica.fetch.min.bytes=51200
 replica.lag.max.messages=5000
 replica.lag.time.max.ms=3
 replica.fetch.wait.max.ms=1000
 fetch.purgatory.purge.interval.requests=5000
 producer.purgatory.purge.interval.requests=5000
 delete.topic.enable=true
 Logs show rebalance happening well up to 24 hours after the start.
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
 preferred replica election:  (kafka.controller.KafkaController)
 …
 [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
 election for partitions:  (kafka.controller.KafkaController)
 ...
 [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
 leader election for partitions  (kafka.controller.KafkaController)
 ...
 [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
 preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

2015-03-11 Thread Joe Stein
Sorry for not catching up on this thread earlier, I wanted to-do this
before the KIP got its updates so we could discuss if need be and not waste
more time re-writing/working things that folks have issues with or such. I
captured all the comments so far here with responses.

 So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

Agreed though this is out of scope (imho) for what the motivations for the
KIP were. The motivations section is blank (that is on me) but honestly it
is because we did all the development, went back and forth with Neha on the
testing and then had to back it all into the KIP process... Its a
time/resource/scheduling and hope to update this soon on the KIP ... all of
this is in the JIRA and code patch so its not like it is not there just not
in the place maybe were folks are looking since we changed where folks
should look.

Initial cut at Motivations: the --generate is not used by a lot of folks
because they don't trust it. Issues such as giving different results
sometimes when you run it. Also other feedback from the community that it
does not account for specific uses cases like adding new brokers and
removing brokers (which is where that patch started
https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
after review into just --rebalance
https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add and
remove brokers is one that happens in AWS and auto scailing. There are
other reasons for this too of course.  The goal originally was to make what
folks are already coding today (with the output of  available in the
project for the community. Based on the discussion in the JIRA with Neha we
all agreed that making it be a faire rebalance would fulfill both uses
cases.

 In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

Agreed, this though I think is out of scope for this change and something
we can also do in the future. There is more that we have to figure out for
rack aware specifically answering how do we know what rack the broker is
on. I really really (really) worry that we keep trying to put too much
into a single change the discussions go into rabbit holes and good
important features (that are community driven) that could get out there
will get bogged down with different uses cases and scope creep. So, I think
rack awareness is its own KIP that has two parts... setting broker rack and
rebalancing for that. That features doesn't invalidate the need for
--rebalance but can be built on top of it.

 I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

I don't agree with this because right now you get back the current state
of the partitions so you can (today) write whatever logic you want (with
the information that is there). With --rebalance you also get that back so
moving forward. Moving forward we can maybe expose more information so that
folks can write different logic they want
(like partition number, location (label string for rack), size, throughput
average, etc, etc, etc... but again... that to me is a different
KIP entirely from the motivations of this one. If eventually we want to
make it plugable then we should have a KIP and discussion around it I just
don't see how it relates to the original motivations of the change.

 Is it possible to describe the proposed partition reassignment algorithm
in more detail on the KIP? In fact, it would be really easy to understand
if we had some concrete examples comparing partition assignment with the
old algorithm and the new.

sure, it is in the JIRA linked to the KIP too though
https://issues.apache.org/jira/browse/KAFKA-1792 and documented in comments
in the patch also as requested. Let me know if this is what you are looking
for and we can simply update the KIP with this information or give more
detail specifically what you think might be missing please.

 Would we want to
support some kind of automated reassignment of existing partitions
(personally - no. I want to trigger that manually because it is a very disk
and 

Re: Review Request 31830: Patch for KAFKA-2009

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31830/
---

(Updated March 11, 2015, 6:26 p.m.)


Review request for kafka.


Bugs: KAFKA-2009
https://issues.apache.org/jira/browse/KAFKA-2009


Repository: kafka


Description (updated)
---

Because the send callback could be fired in producer.send() as well, so unacked 
offset needs to be added to unacked offsets list before call producer.send()


Diffs (updated)
-

  core/src/main/scala/kafka/tools/MirrorMaker.scala 
bafa379ff57bc46458ea8409406f5046dc9c973e 

Diff: https://reviews.apache.org/r/31830/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357351#comment-14357351
 ] 

Jiangjie Qin commented on KAFKA-2009:
-

Just submitted another small fix patch.
I just realized that I still need to add unacked offset into unacked offsets 
list before calling producer.send(), because the callback can also be fired in 
producer.send().
The reason I added the unacked offset after calling producer.send() was that 
I'm worrying that if something wrong in producer.send() occurs, the offset 
would never be removed, but since we exiting on any exception in producer 
thread, it might not cause further problem as the entire mirror maker will exit.

 Fix UncheckedOffset.removeOffset synchronization and trace logging issue in 
 mirror maker
 

 Key: KAFKA-2009
 URL: https://issues.apache.org/jira/browse/KAFKA-2009
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch


 This ticket is to fix the mirror maker problem on current trunk which is 
 introduced in KAFKA-1650.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/
---

(Updated March 11, 2015, 5:41 p.m.)


Review request for kafka.


Bugs: KAFKA-1461
https://issues.apache.org/jira/browse/KAFKA-1461


Repository: kafka


Description
---

KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.


Diffs (updated)
-

  core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
d6d14fbd167fb8b085729cda5158898b1a3ee314 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
52c79201af7c60f9b44a0aaa09cdf968d89a7b87 

Diff: https://reviews.apache.org/r/31927/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1461:
--
Attachment: KAFKA-1461_2015-03-11_10:41:26.patch

 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357366#comment-14357366
 ] 

Guozhang Wang commented on KAFKA-1910:
--

Got some problems with RB, uploading the patch here for a quick review:

{code}
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
index e972efb..436f9b2 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
@@ -129,7 +129,7 @@ public final class Coordinator {
 
 // process the response
 JoinGroupResponse response = new 
JoinGroupResponse(resp.responseBody());
-// TODO: needs to handle disconnects and errors
+// TODO: needs to handle disconnects and errors, should not just throw 
exceptions
 Errors.forCode(response.errorCode()).maybeThrow();
 this.consumerId = response.consumerId();
 
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
index 27c78b8..8b71fba 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
@@ -231,11 +231,12 @@ public class FetcherK, V {
 log.debug(Fetched offset {} for partition {}, 
offset, topicPartition);
 return offset;
 } else if (errorCode == 
Errors.NOT_LEADER_FOR_PARTITION.code()
-|| errorCode == Errors.LEADER_NOT_AVAILABLE.code()) {
+|| errorCode == 
Errors.UNKNOWN_TOPIC_OR_PARTITION.code()) {
 log.warn(Attempt to fetch offsets for partition {} 
failed due to obsolete leadership information, retrying.,
 topicPartition);
 awaitMetadataUpdate();
 } else {
+// TODO: we should not just throw exceptions but 
should handle and log it.
 Errors.forCode(errorCode).maybeThrow();
 }
 }
diff --git 
a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
 
b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
index af704f3..f706086 100644
--- 
a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
+++ 
b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
@@ -45,7 +45,9 @@ public class ListOffsetResponse extends 
AbstractRequestResponse {
 /**
  * Possible error code:
  *
- * TODO
+ *  UNKNOWN_TOPIC_OR_PARTITION (3)
+ *  NOT_LEADER_FOR_PARTITION (6)
+ *  UNKNOWN (-1)
  */
 
 private static final String OFFSETS_KEY_NAME = offsets;
diff --git a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala 
b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
index fed37e3..8eae1ab 100644
--- a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
+++ b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
@@ -260,8 +260,10 @@ class ConsumerTest extends IntegrationTestHarness with 
Logging {
 var iter: Int = 0
 
 override def doWork(): Unit = {
-  killRandomBroker()
+  info(Killed broker %d.format(killRandomBroker()))
+  Thread.sleep(500)
   restartDeadBrokers()
+  info(Restarted all brokers)
 
   iter += 1
   if (iter == numIters)
{code}

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker

2015-03-11 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-2009:

Attachment: KAFKA-2009_2015-03-11_11:26:57.patch

 Fix UncheckedOffset.removeOffset synchronization and trace logging issue in 
 mirror maker
 

 Key: KAFKA-2009
 URL: https://issues.apache.org/jira/browse/KAFKA-2009
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch


 This ticket is to fix the mirror maker problem on current trunk which is 
 introduced in KAFKA-1650.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31830: Patch for KAFKA-2009

2015-03-11 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31830/#review76145
---

Ship it!


Ship It!

- Onur Karaman


On March 11, 2015, 6:26 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31830/
 ---
 
 (Updated March 11, 2015, 6:26 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2009
 https://issues.apache.org/jira/browse/KAFKA-2009
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Because the send callback could be fired in producer.send() as well, so 
 unacked offset needs to be added to unacked offsets list before call 
 producer.send()
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 bafa379ff57bc46458ea8409406f5046dc9c973e 
 
 Diff: https://reviews.apache.org/r/31830/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357725#comment-14357725
 ] 

Jun Rao commented on KAFKA-1910:


Are the changes in ConsumerTest needed? The extra logging and sleep seem to be 
just for debugging. Other than that. +1. Ran the tests locally 20 times and 
they all pass.

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76149
---

Ship it!


Makes sense. Not a committer but looks good to me :)
Unit tests passed and compilation warnings went away.

- Jiangjie Qin


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31925/
 ---
 
 (Updated March 11, 2015, 4:35 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1054
 https://issues.apache.org/jira/browse/KAFKA-1054
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
 up to date with the latest trunk and fixed 2 more lingering compiler warnings.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
 
 Diff: https://reviews.apache.org/r/31925/diff/
 
 
 Testing
 ---
 
 Ran full test suite.
 
 
 Thanks,
 
 Blake Smith
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76157
---


This looks like a very good start. I think the framework is flexible enough to 
allow us to add a variety of upgrade tests. I'm looking forward to it.


I have few comments, but mostly I'm still confused on how this will be used. 
Perhaps more comments or even a readme is in order

You wrote that we invoke test.sh dir1 dir2, what should each directory 
contain? just the kafka jar of different versions? or an entire installation 
(including bin/ and conf/)?
Which one of the directories should be the newer and which is older? does it 
matter?
Which version of clients will be used.

Perhaps a more descriptive name for test.sh can help too. I'm guessing we'll 
have a whole collection of those test scripts soon.

Gwen


build.gradle
https://reviews.apache.org/r/30809/#comment123636

This should probably be a test dependency (if needed at all)

Packaging Guava will be a pain, since so many systems use different 
versions of Guava and they are all incompatible.



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment123635

Do we really want to do this? 

We are using joptsimple for a bunch of other tools. It is easier to read, 
maintain, nice error messages, help screen, etc.



system_test/broker_upgrade/bin/kafka-run-class.sh
https://reviews.apache.org/r/30809/#comment123638

Why did we decide to duplicate this entire file?


- Gwen Shapira


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357794#comment-14357794
 ] 

Guozhang Wang commented on KAFKA-1910:
--

Thanks Jun, incorporated the comments and commit to trunk.

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1461:
--
Attachment: KAFKA-1461_2015-03-11_18:17:51.patch

 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357927#comment-14357927
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/
---

(Updated March 12, 2015, 1:17 a.m.)


Review request for kafka.


Bugs: KAFKA-1461
https://issues.apache.org/jira/browse/KAFKA-1461


Repository: kafka


Description
---

KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.


Diffs (updated)
-

  core/src/main/scala/kafka/consumer/ConsumerFetcherThread.scala 
ee6139c901082358382c5ef892281386bf6fc91b 
  core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
d6d14fbd167fb8b085729cda5158898b1a3ee314 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
52c79201af7c60f9b44a0aaa09cdf968d89a7b87 

Diff: https://reviews.apache.org/r/31927/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 11, 2015, 10:20 p.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Expanded imports from _ and * to full class path


Incorporated Joel's comments.


Addressed Joel's comments.


Addressed Guozhang's comments.


Incorporated Guozhang's comments.


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
bafa379ff57bc46458ea8409406f5046dc9c973e 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
19640cc55b5baa0a26a808d708b7f4caf491c9f0 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357714#comment-14357714
 ] 

Jiangjie Qin commented on KAFKA-1997:
-

Updated reviewboard https://reviews.apache.org/r/31706/diff/
 against branch origin/trunk

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
 KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
 KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Jiangjie Qin


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183
 
  This is essentially a sync approach, can we use callback to do this?
 
 Abhishek Nigam wrote:
 This is intentional. We want to make sure the event has successfully 
 reached the brokers. This enables us to form a reasonable expectation of what 
 the consumer should expect.

The callback should be able to make sure everything goes well otherwise you can 
chose stop sending or do whatever you want. One big issue about this approach 
is that you will only send a single message out for each batch, and that would 
be very slow especially if you don's set linger time to be some thing very 
small even zero.
Ideally the test case should work with differnt producer settings, I would say 
at least ack=1 and ack=-1, also with different batch size and linger time. 
Sending a single message out for each batch does not look a very useful test 
case.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184
 
  When a send fails, should we at least log the sequence number?
 
 Abhishek Nigam wrote:
 I log the exception and the logger gives me the timestamp in the logs.
 Maybe I am missing something. Can you explain the rationale of why we 
 would want to log the sequence number on the producer side when send fails.

Suppose someone is reading the log because something went wrong, wouldn't it be 
much faster to see how many and which messages are lost if you have sequence 
number logged? 
For example, with sequence number, you can see producer saying that I messge 
1,2,3 sent successfully while message 4 failed. And consumer would say, I 
expect to see 1,2,3 but I just got 1,3. 2 is lost.
In your current log, what you can see is just a message wasn't sent 
successfully, and one message was not consumed as expected. It's more 
complicated to debug, right?


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Build failed in Jenkins: KafkaPreCommit #36

2015-03-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/KafkaPreCommit/36/changes

Changes:

[wangguoz] KAFKA-1910 Follow-up again; fix ListOffsetResponse handling for the 
expected error codes

--
[...truncated 1643 lines...]
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40)

kafka.integration.PrimitiveApiTest  testPipelinedProduceRequests FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:40)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:40)
at 
kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33)
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40)

kafka.integration.AutoOffsetResetTest  testResetToEarliestWhenOffsetTooHigh 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest  testResetToEarliestWhenOffsetTooLow 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest  testResetToLatestWhenOffsetTooHigh 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest  testResetToLatestWhenOffsetTooLow FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 

Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183
 
  This is essentially a sync approach, can we use callback to do this?

This is intentional. We want to make sure the event has successfully reached 
the brokers. This enables us to form a reasonable expectation of what the 
consumer should expect.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184
 
  When a send fails, should we at least log the sequence number?

I log the exception and the logger gives me the timestamp in the logs.
Maybe I am missing something. Can you explain the rationale of why we would 
want to log the sequence number on the producer side when send fails.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 321
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line321
 
  Similar to producer, can we log the expected sequence number and the 
  seq we actually saw?

Sure in the cases where this a mismatch I could do that.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 386
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line386
 
  Can we use KafkaThread here?

I will take a look at that.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Jenkins build is back to normal : Kafka-trunk #423

2015-03-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/423/changes



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  This looks like a very good start. I think the framework is flexible enough 
  to allow us to add a variety of upgrade tests. I'm looking forward to it.
  
  
  I have few comments, but mostly I'm still confused on how this will be 
  used. Perhaps more comments or even a readme is in order
  
  You wrote that we invoke test.sh dir1 dir2, what should each 
  directory contain? just the kafka jar of different versions? or an entire 
  installation (including bin/ and conf/)?
  Which one of the directories should be the newer and which is older? does 
  it matter?
  Which version of clients will be used.
  
  Perhaps a more descriptive name for test.sh can help too. I'm guessing 
  we'll have a whole collection of those test scripts soon.
  
  Gwen

The directory containing the kafka jars. 
kafka_2.10-0.8.3-SNAPSHOT.jar
kafka-clients-0.8.3-SNAPSHOT.jar
The other jars are shared between both the kafka brokers.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  build.gradle, line 209
  https://reviews.apache.org/r/30809/diff/3/?file=889854#file889854line209
 
  This should probably be a test dependency (if needed at all)
  
  Packaging Guava will be a pain, since so many systems use different 
  versions of Guava and they are all incompatible.

Guava provides an excellent rate limiter which I am using in the test and have 
used in the past.
When you talk about packaging we are already pulling in other external 
libraries like zookeeper with a specific version which the applications might 
be using extensively and might similarly run into conflicts.

If you have a suggestion for a library which provides rate limiting(less 
popular) than guava then I can use that instead otherwise I will move this 
dependency to the test for now.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 409-440
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line409
 
  Do we really want to do this? 
  
  We are using joptsimple for a bunch of other tools. It is easier to 
  read, maintain, nice error messages, help screen, etc.

Thanks, I will switch to jobOpts.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  system_test/broker_upgrade/bin/kafka-run-class.sh, lines 152-156
  https://reviews.apache.org/r/30809/diff/3/?file=889856#file889856line152
 
  Why did we decide to duplicate this entire file?

The only difference is that it takes an additional argument which contains the 
directory from which the kafka jars should be pulled.
Would you recommend adding it to the original script as an optional argument?


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76157
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description
---

Patch for KAFKA-1546


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/log/Log.scala 
06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



Re: Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

(Updated March 12, 2015, 1:39 a.m.)


Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description (updated)
---

Patch for KAFKA-1546
Brief summary of changes:
- Added a lagBegin metric inside Replica to track the lag in terms of time 
since the replica did not read from the LEO
- Using lag begin value in the check for ISR expand and shrink
- Removed the max lag messages config since it is no longer necessary
- Returning the initialLogEndOffset in LogReadResult corresponding to the the 
LEO before actually reading from the log.
- Unit test cases to test ISR shrinkage and expansion


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/log/Log.scala 
06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



0.8.3 release plan

2015-03-11 Thread Joe Stein
There hasn't been any public discussion about the 0.8.3 release plan.

There seems to be a lot of work in flight, work with patches and review
that could/should get committed but now just pending KIPS, work without KIP
but that is in trunk already (e.g. the new Consumer) that would be the the
release but missing the KIP for the release...

What does this mean for the 0.8.3 release? What are we trying to get out
and when?

Also looking at
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there
seems to be things we are getting earlier (which is great of course) so are
we going to try to up the version and go with 0.9.0?

0.8.2.0 ended up getting very bloated and that delayed it much longer than
we had originally communicated to the community and want to make sure we
take that feedback from the community and try to improve upon it.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -


Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Guozhang Wang
+1 (binding)

On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid
wrote:


 https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer




-- 
-- Guozhang


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
Yeah I'd be in favor of a quicker, smaller release but I think as long as
we have these big things in flight we should probably keep the release
criteria feature-based rather than time-based, though (e.g. when X works
not every other month).

Ideally the next release would have at least a beta version of the new
consumer. I think having a new hunk of code like that available but marked
as beta is maybe a good way to go, as it gets it into peoples hands for
testing. This way we can declare the API not fully locked down until the
final release too, since mostly users only look at stuff after we release
it. Maybe we can try to construct a schedule around this?

-Jay


On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote:

 There hasn't been any public discussion about the 0.8.3 release plan.

 There seems to be a lot of work in flight, work with patches and review
 that could/should get committed but now just pending KIPS, work without KIP
 but that is in trunk already (e.g. the new Consumer) that would be the the
 release but missing the KIP for the release...

 What does this mean for the 0.8.3 release? What are we trying to get out
 and when?

 Also looking at
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
 there
 seems to be things we are getting earlier (which is great of course) so are
 we going to try to up the version and go with 0.9.0?

 0.8.2.0 ended up getting very bloated and that delayed it much longer than
 we had originally communicated to the community and want to make sure we
 take that feedback from the community and try to improve upon it.

 Thanks!

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -



[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358063#comment-14358063
 ] 

Jay Kreps commented on KAFKA-1546:
--

Well iiuc the wonderfulness of this feature is that it actually doesn't add any 
new configs, it removes an old one that was impossible to set correctly and 
slightly modifies the meaning of an existing one to do what it sounds like it 
does. So I do think that for 99.5% of the world this will be like, wow, Kafka 
replication is much more robust.

I do think it is definitely a bug fix not a feature. But hey, I love me some 
KIPs, so I can't object to a nice write-up if you think it would be good to 
have.

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1546:
-
Fix Version/s: 0.8.3

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2001) OffsetCommitTest hang during setup

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2001:
-
Fix Version/s: 0.8.3

 OffsetCommitTest hang during setup
 --

 Key: KAFKA-2001
 URL: https://issues.apache.org/jira/browse/KAFKA-2001
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.3
Reporter: Jun Rao
Assignee: Joel Koshy
 Fix For: 0.8.3


 OffsetCommitTest seems to hang in trunk reliably. The following is the 
 stacktrace. It seems to hang during setup.
 Test worker prio=5 tid=7fb5ab154800 nid=0x11198e000 waiting on condition 
 [11198c000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
 kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:59)
 at 
 kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:58)
 at scala.collection.immutable.Stream.dropWhile(Stream.scala:825)
 at kafka.server.OffsetCommitTest.setUp(OffsetCommitTest.scala:58)
 at junit.framework.TestCase.runBare(TestCase.java:132)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:91)
 at 
 org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
 at 
 org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
 at 
 org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
 at 
 org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1986) Producer request failure rate should not include InvalidMessageSizeException and OffsetOutOfRangeException

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1986:
-
Fix Version/s: 0.8.3

 Producer request failure rate should not include InvalidMessageSizeException 
 and OffsetOutOfRangeException
 --

 Key: KAFKA-1986
 URL: https://issues.apache.org/jira/browse/KAFKA-1986
 Project: Kafka
  Issue Type: Sub-task
Reporter: Aditya Auradkar
Assignee: Aditya Auradkar
 Fix For: 0.8.3

 Attachments: KAFKA-1986.patch


 InvalidMessageSizeException and OffsetOutOfRangeException should not be 
 counted a failedProduce and failedFetch requests since they are client side 
 errors. They should be treated similarly to UnknownTopicOrPartitionException 
 or NotLeaderForPartitionException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1938) [doc] Quick start example should reference appropriate Kafka version

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1938:
-
Fix Version/s: 0.8.3

 [doc] Quick start example should reference appropriate Kafka version
 

 Key: KAFKA-1938
 URL: https://issues.apache.org/jira/browse/KAFKA-1938
 Project: Kafka
  Issue Type: Improvement
  Components: website
Affects Versions: 0.8.2.0
Reporter: Stevo Slavic
Assignee: Manikumar Reddy
Priority: Trivial
 Fix For: 0.8.3

 Attachments: lz4-compression.patch, remove-081-references-1.patch, 
 remove-081-references.patch


 Kafka 0.8.2.0 documentation, quick start example on 
 https://kafka.apache.org/documentation.html#quickstart in step 1 links and 
 instructs reader to download Kafka 0.8.1.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1925) Coordinator Node Id set to INT_MAX breaks coordinator metadata updates

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1925:
-
Fix Version/s: 0.8.3

 Coordinator Node Id set to INT_MAX breaks coordinator metadata updates
 --

 Key: KAFKA-1925
 URL: https://issues.apache.org/jira/browse/KAFKA-1925
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
Priority: Critical
 Fix For: 0.8.3

 Attachments: KAFKA-1925.v1.patch


 KafkaConsumer used INT_MAX to mimic a new socket for coordinator (details can 
 be found in KAFKA-1760). However, this behavior breaks the coordinator as the 
 underlying NetworkClient only used the node id to determine when to initiate 
 a new connection:
 {code}
 if (connectionStates.canConnect(node.id(), now))
 // if we are interested in sending to a node and we don't have a 
 connection to it, initiate one
 initiateConnect(node, now);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1943) Producer request failure rate should not include MessageSetSizeTooLarge and MessageSizeTooLargeException

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1943:
-
Fix Version/s: 0.8.3

 Producer request failure rate should not include MessageSetSizeTooLarge and 
 MessageSizeTooLargeException
 

 Key: KAFKA-1943
 URL: https://issues.apache.org/jira/browse/KAFKA-1943
 Project: Kafka
  Issue Type: Sub-task
Reporter: Aditya A Auradkar
Assignee: Aditya Auradkar
 Fix For: 0.8.3

 Attachments: KAFKA-1943.patch


 If MessageSetSizeTooLargeException or MessageSizeTooLargeException is thrown 
 from Log, then ReplicaManager counts it as a failed produce request. My 
 understanding is that this metric should only count failures as a result of 
 broker issues and not bad requests sent by the clients.
 If the message or message set is too large, then it is a client side error 
 and should not be reported. (similar to NotLeaderForPartitionException, 
 UnknownTopicOrPartitionException).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1957) code doc typo

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1957:
-
Fix Version/s: 0.8.3

 code doc typo
 -

 Key: KAFKA-1957
 URL: https://issues.apache.org/jira/browse/KAFKA-1957
 Project: Kafka
  Issue Type: Improvement
  Components: config
Reporter: Yaguo Zhou
Priority: Trivial
 Fix For: 0.8.3

 Attachments: 
 0001-fix-typo-SO_SNDBUFF-SO_SNDBUF-SO_RCVBUFF-SO_RCVBUF.patch


 There are doc typo in kafka.server.KafkaConfig.scala, SO_SNDBUFF should be 
 SO_SNDBUF and SO_RCVBUFF should be SO_RCVBUF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1948) kafka.api.consumerTests are hanging

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1948:
-
Fix Version/s: 0.8.3

 kafka.api.consumerTests are hanging
 ---

 Key: KAFKA-1948
 URL: https://issues.apache.org/jira/browse/KAFKA-1948
 Project: Kafka
  Issue Type: Bug
Reporter: Gwen Shapira
Assignee: Guozhang Wang
 Fix For: 0.8.3

 Attachments: KAFKA-1948.patch


 Noticed today that very often when I run the full test suite, it hangs on 
 kafka.api.consumerTest (not always same test though). It doesn't reproduce 
 100% of the time, but enough to be very annoying.
 I also saw it happening on trunk after KAFKA-1333:
 https://builds.apache.org/view/All/job/Kafka-trunk/389/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1959) Class CommitThread overwrite group of Thread class causing compile errors

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1959:
-
Fix Version/s: 0.8.3

 Class CommitThread overwrite group of Thread class causing compile errors
 -

 Key: KAFKA-1959
 URL: https://issues.apache.org/jira/browse/KAFKA-1959
 Project: Kafka
  Issue Type: Bug
  Components: core
 Environment: scala 2.10.4
Reporter: Tong Li
Assignee: Tong Li
  Labels: newbie
 Fix For: 0.8.3

 Attachments: KAFKA-1959.patch, compileError.png


 class CommitThread(id: Int, partitionCount: Int, commitIntervalMs: Long, 
 zkClient: ZkClient)
 extends ShutdownableThread(commit-thread)
 with KafkaMetricsGroup {
 private val group = group- + id
 group overwrite class Thread group member, causing the following compile 
 error:
 overriding variable group in class Thread of type ThreadGroup;  value group 
 has weaker access privileges; it should not be private



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1969) NPE in unit test for new consumer

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1969:
-
Fix Version/s: 0.8.3

 NPE in unit test for new consumer
 -

 Key: KAFKA-1969
 URL: https://issues.apache.org/jira/browse/KAFKA-1969
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.3

 Attachments: stack.out


 {code}
 kafka.api.ConsumerTest  testConsumptionWithBrokerFailures FAILED
 java.lang.NullPointerException
 at 
 org.apache.kafka.clients.consumer.KafkaConsumer.ensureCoordinatorReady(KafkaConsumer.java:1238)
 at 
 org.apache.kafka.clients.consumer.KafkaConsumer.initiateCoordinatorRequest(KafkaConsumer.java:1189)
 at 
 org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:777)
 at 
 org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:816)
 at 
 org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:704)
 at 
 kafka.api.ConsumerTest.consumeWithBrokerFailures(ConsumerTest.scala:167)
 at 
 kafka.api.ConsumerTest.testConsumptionWithBrokerFailures(ConsumerTest.scala:152)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1964) testPartitionReassignmentCallback hangs occasionally

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1964:
-
Fix Version/s: 0.8.3

  testPartitionReassignmentCallback hangs occasionally
 -

 Key: KAFKA-1964
 URL: https://issues.apache.org/jira/browse/KAFKA-1964
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 0.8.3
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: stack.out






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1960) .gitignore does not exclude test generated files and folders.

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1960:
-
Fix Version/s: 0.8.3

 .gitignore does not exclude test generated files and folders.
 -

 Key: KAFKA-1960
 URL: https://issues.apache.org/jira/browse/KAFKA-1960
 Project: Kafka
  Issue Type: Bug
  Components: build
Reporter: Tong Li
Assignee: Tong Li
Priority: Minor
  Labels: newbie
 Fix For: 0.8.3

 Attachments: KAFKA-1960.patch


 gradle test can create quite few folders, .gitignore should exclude these 
 files for an easier git submit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-11_15:20:18.patch

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
 KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
 KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment123652

This is essentially a sync approach, can we use callback to do this?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment123653

When a send fails, should we at least log the sequence number?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment123659

Similar to producer, can we log the expected sequence number and the seq we 
actually saw?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment123655

Can we use KafkaThread here?


- Jiangjie Qin


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357947#comment-14357947
 ] 

Aditya A Auradkar commented on KAFKA-1546:
--

Created reviewboard https://reviews.apache.org/r/31967/diff/
 against branch origin/trunk

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya A Auradkar updated KAFKA-1546:
-
Attachment: KAFKA-1546.patch

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin


 On March 12, 2015, 1:22 a.m., Guozhang Wang wrote:
  Hit this unit test failure, is this relevant?
  
  --
  
  kafka.consumer.ZookeeperConsumerConnectorTest  
  testConsumerRebalanceListener FAILED
  junit.framework.AssertionFailedError: 
  expected:List((0,group1_consumer1-0), (1,group1_consumer2-0)) but 
  was:ArrayBuffer((1,group1_consumer2-0))
  at junit.framework.Assert.fail(Assert.java:47)
  at junit.framework.Assert.failNotEquals(Assert.java:277)
  at junit.framework.Assert.assertEquals(Assert.java:64)
  at junit.framework.Assert.assertEquals(Assert.java:71)
  at 
  kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393)

It is a bug but irrelavant to this patch I believe. The reason is that consumer 
2 finishes rebalance before consumer 1 does, so zookeeper did not have the 
complete ownership info yet when we do the assertion. I fixed this by consuming 
one message from consumer 1 before checking ownership info from zookeeper.


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76189
---


On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31706/
 ---
 
 (Updated March 11, 2015, 10:20 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1997
 https://issues.apache.org/jira/browse/KAFKA-1997
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Guozhang's comments.
 
 
 Changed the exit behavior on send failure because close(0) is not ready yet. 
 Will submit followup patch after KAFKA-1660 is checked in.
 
 
 Expanded imports from _ and * to full class path
 
 
 Incorporated Joel's comments.
 
 
 Addressed Joel's comments.
 
 
 Addressed Guozhang's comments.
 
 
 Incorporated Guozhang's comments.
 
 
 Diffs
 -
 
   
 clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
 e6ff7683a0df4a7d221e949767e57c34703d5aad 
   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
 5487259751ebe19f137948249aa1fd2637d2deb4 
   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 bafa379ff57bc46458ea8409406f5046dc9c973e 
   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
   
 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
 
 Diff: https://reviews.apache.org/r/31706/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-11_19:10:53.patch

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
 KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
 KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, 
 KAFKA-1997_2015-03-11_19:10:53.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357993#comment-14357993
 ] 

Jiangjie Qin commented on KAFKA-1997:
-

Updated reviewboard https://reviews.apache.org/r/31706/diff/
 against branch origin/trunk

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
 KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
 KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, 
 KAFKA-1997_2015-03-11_19:10:53.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358039#comment-14358039
 ] 

Jay Kreps commented on KAFKA-1546:
--

Personally I don't think it really needs a KIP, it subtly changes the meaning 
of one config, but it actually changes it to mean what everyone thinks it 
currently means. What do you think? I think this one is less about user 
expectations or our opinions and more about does it actually work. Speaking 
of which...

[~auradkar] What is the test plan for this? It is trivially easy to reproduce 
the problems with the old approach. Start a server with default settings and 
1-2 replicas and use the perf test to generate a ton of load with itty bitty 
messages and just watch the replicas drop in and out of sync. We should concoct 
the most brutal case of this and validate that unless the follower actually 
falls behind it never failure detects out of the ISR. But we also need to check 
the reverse condition, that both a soft death and a lag are still detected. You 
can cause a soft death by setting the zk session timeout to something massive 
and just using unix signals to pause the process. You can cause lag by just 
running some commands on one of the followers to eat up all the cpu or I/O 
while a load test is running until the follower falls behind. Both cases should 
get caught.

Anyhow, awesome job getting this done. I think this is one of the biggest 
stability issues in Kafka right now. The patch lgtm, but it would be good for 
[~junrao] and [~nehanarkhede] to take a look.



 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358040#comment-14358040
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~charmalloc] since there aren't any interface changes I am not sure if a KIP 
is necessary. Ofcourse we added a new config for replica.fetch.backoff.ms If 
this warrants a KIP than I can write up one.

 Replica fetcher thread does not implement any back-off behavior
 ---

 Key: KAFKA-1461
 URL: https://issues.apache.org/jira/browse/KAFKA-1461
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
 KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch


 The current replica fetcher thread will retry in a tight loop if any error 
 occurs during the fetch call. For example, we've seen cases where the fetch 
 continuously throws a connection refused exception leading to several replica 
 fetcher threads that spin in a pretty tight loop.
 To a much lesser degree this is also an issue in the consumer fetcher thread, 
 although the fact that erroring partitions are removed so a leader can be 
 re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Jiangjie Qin
https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer



Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Joe Stein
Could the KIP confluence please have updated the discussion thread link,
thanks... could you also remove the template boilerplate at the top *This
page is meant as a template ..* so we can capture it for the release
cleanly.

Also I don't really/fully understand how this is different than
flush(time); close() and why close has its own timeout also?

Lastly, what is the forceClose flag? This isn't documented in the public
interface so it isn't clear how to completely use the feature just by
reading the KIP.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang wangg...@gmail.com wrote:

 +1 (binding)

 On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid
 wrote:

 
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
 
 


 --
 -- Guozhang



Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Jay Kreps
Yeah guys, I'd like to second that. I'd really really love to get the
quality of these to the point where we could broadly solicit user input and
use them as a permanent document of the alternatives and rationale.

I know it is a little painful to have process, but I think we all saw what
happened to the previous clients as public interfaces so I really really
really want us to just be incredibly thoughtful and disciplined as we make
changes. I think we all want to avoid another client rewrite.

To second Joe's question in a more specific way, I think an alternative I
don't see considered to give close() a bounded time is just to enforce the
request time on the client side, which will cause all requests to be failed
after the request timeout expires. This was the same behavior as for flush.
In the case where the user just wants to ensure close doesn't block forever
I think that may be sufficient?

So one alternative might be to just do that request timeout feature and add
a new producer config that is something like
  abort.on.failure=false
which causes the producer to hard exit if it can't send a request. Which I
think is closer to what you want, with this just being a way to implement
that behavior.

I'm not sure if this is better or worse, but we should be sure before we
make the change.

I also have a concern about
  producer.close(0, TimeUnit.MILLISECONDS)
not meaning close with a timeout of 0 ms.

I realize this exists in other java apis, but it is so confusing it even
confused us into having that recent producer bug because of course all the
other numbers mean wait that long.

I'd propose
  close()--block until all completed
  close(0, TimeUnit.MILLISECONDS)--block for 0 ms
  close(5, TimeUnit.MILLISECONDS)--block for 5 ms
  close(-1, TimeUnit.MILLISECONDS)--error because blocking for negative ms
would mean completing in the past :-)

-Jay

On Wed, Mar 11, 2015 at 8:31 PM, Joe Stein joe.st...@stealth.ly wrote:

 Could the KIP confluence please have updated the discussion thread link,
 thanks... could you also remove the template boilerplate at the top *This
 page is meant as a template ..* so we can capture it for the release
 cleanly.

 Also I don't really/fully understand how this is different than
 flush(time); close() and why close has its own timeout also?

 Lastly, what is the forceClose flag? This isn't documented in the public
 interface so it isn't clear how to completely use the feature just by
 reading the KIP.

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang wangg...@gmail.com
 wrote:

  +1 (binding)
 
  On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin j...@linkedin.com.invalid
 
  wrote:
 
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
  
  
 
 
  --
  -- Guozhang
 



[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358058#comment-14358058
 ] 

Joe Stein commented on KAFKA-1546:
--

we could also mark the JIRA as a bug instead of improvment

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358074#comment-14358074
 ] 

Aditya Auradkar commented on KAFKA-1546:


I'll write a short KIP on this and circulate it tomorrow. In the meantime, I 
guess Jun/Neha can also review it since the actual fix has been discussed in 
enough detail on this jira.

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
With regard to mm, I was kind of assuming just based on the amount of work
that that would go in for sure, but yeah I agree it is important.

-Jay

On Wed, Mar 11, 2015 at 9:39 PM, Jay Kreps jay.kr...@gmail.com wrote:

 What I was trying to say was let's do a real release whenever either
 consumer or authn is done whichever happens first (or both if they can
 happen close together)--not sure which is more likely to slip.

 WRT the beta thing I think the question for people is whether the beta
 period was helpful or not in getting a more stable release? We could either
 do a beta release again or we could just do a normal release and call the
 consumer feature experimental or whatever...basically something to get it
 in peoples hands before it is supposed to work perfectly and never change
 again.

 -Jay


 On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira gshap...@cloudera.com
 wrote:

 So basically you are suggesting - lets do a beta release whenever we
 feel the new consumer is done?

 This can definitely work.

 I'd prefer holding for MM improvements too. IMO, its not just more
 improvements like flush() and compression optimization.
 Current MirrorMaker can lose data, which makes it pretty useless for
 its job. We hear lots of requests for robust MM from our customers, so
 I can imagine its pretty important to the Kafka community (unless I
 have a completely skewed sample).

 Gwen



 On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps jay.kr...@gmail.com wrote:
  Yeah the real question is always what will we block on?
 
  I don't think we should try to hold back smaller changes. In this
 bucket I
  would include most things you described: mm improvements, replica
  assignment tool improvements, flush, purgatory improvements, compression
  optimization, etc. Likely these will all get done in time as well as
 many
  things that kind of pop up from users but probably aren't worth doing a
  release for on their own. If one of them slips that fine. I also don't
  think we should try to hold back work that is done if it isn't on a
 list.
 
  I would consider either SSL+SASL or the consumer worthy of a release on
 its
  own. If they finish close to the same time that is great. We can maybe
 just
  assess as these evolve where the other one is at and make a call
 whether it
  will be one or both?
 
  -Jay
 
  On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  If we are going in terms of features, I can see the following features
  getting in in the next month or two:
 
  * New consumer
  * Improved Mirror Maker (I've seen tons of interest)
  * Centralized admin requests (aka KIP-4)
  * Nicer replica-reassignment tool
  * SSL (and perhaps also SASL)?
 
  I think this collection will make a nice release. Perhaps we can cap
  it there and focus (as a community) on getting these in, we can have a
  release without too much scope creep in the not-very-distant-future?
  Even just 3 out of these 5 will still make a nice incremental
  improvement.
 
  Gwen
 
 
  On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com
 wrote:
   Yeah I'd be in favor of a quicker, smaller release but I think as
 long as
   we have these big things in flight we should probably keep the
 release
   criteria feature-based rather than time-based, though (e.g. when X
  works
   not every other month).
  
   Ideally the next release would have at least a beta version of the
 new
   consumer. I think having a new hunk of code like that available but
  marked
   as beta is maybe a good way to go, as it gets it into peoples
 hands for
   testing. This way we can declare the API not fully locked down until
 the
   final release too, since mostly users only look at stuff after we
 release
   it. Maybe we can try to construct a schedule around this?
  
   -Jay
  
  
   On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly
 wrote:
  
   There hasn't been any public discussion about the 0.8.3 release
 plan.
  
   There seems to be a lot of work in flight, work with patches and
 review
   that could/should get committed but now just pending KIPS, work
 without
  KIP
   but that is in trunk already (e.g. the new Consumer) that would be
 the
  the
   release but missing the KIP for the release...
  
   What does this mean for the 0.8.3 release? What are we trying to
 get out
   and when?
  
   Also looking at
  
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
   there
   seems to be things we are getting earlier (which is great of
 course) so
  are
   we going to try to up the version and go with 0.9.0?
  
   0.8.2.0 ended up getting very bloated and that delayed it much
 longer
  than
   we had originally communicated to the community and want to make
 sure we
   take that feedback from the community and try to improve upon it.
  
   Thanks!
  
   ~ Joe Stein
   - - - - - - - - - - - - - - - - -
  
 http://www.stealth.ly
   - - - - - - - - - - - - - - - - -
  
 





Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
What I was trying to say was let's do a real release whenever either
consumer or authn is done whichever happens first (or both if they can
happen close together)--not sure which is more likely to slip.

WRT the beta thing I think the question for people is whether the beta
period was helpful or not in getting a more stable release? We could either
do a beta release again or we could just do a normal release and call the
consumer feature experimental or whatever...basically something to get it
in peoples hands before it is supposed to work perfectly and never change
again.

-Jay


On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira gshap...@cloudera.com wrote:

 So basically you are suggesting - lets do a beta release whenever we
 feel the new consumer is done?

 This can definitely work.

 I'd prefer holding for MM improvements too. IMO, its not just more
 improvements like flush() and compression optimization.
 Current MirrorMaker can lose data, which makes it pretty useless for
 its job. We hear lots of requests for robust MM from our customers, so
 I can imagine its pretty important to the Kafka community (unless I
 have a completely skewed sample).

 Gwen



 On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps jay.kr...@gmail.com wrote:
  Yeah the real question is always what will we block on?
 
  I don't think we should try to hold back smaller changes. In this bucket
 I
  would include most things you described: mm improvements, replica
  assignment tool improvements, flush, purgatory improvements, compression
  optimization, etc. Likely these will all get done in time as well as many
  things that kind of pop up from users but probably aren't worth doing a
  release for on their own. If one of them slips that fine. I also don't
  think we should try to hold back work that is done if it isn't on a list.
 
  I would consider either SSL+SASL or the consumer worthy of a release on
 its
  own. If they finish close to the same time that is great. We can maybe
 just
  assess as these evolve where the other one is at and make a call whether
 it
  will be one or both?
 
  -Jay
 
  On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  If we are going in terms of features, I can see the following features
  getting in in the next month or two:
 
  * New consumer
  * Improved Mirror Maker (I've seen tons of interest)
  * Centralized admin requests (aka KIP-4)
  * Nicer replica-reassignment tool
  * SSL (and perhaps also SASL)?
 
  I think this collection will make a nice release. Perhaps we can cap
  it there and focus (as a community) on getting these in, we can have a
  release without too much scope creep in the not-very-distant-future?
  Even just 3 out of these 5 will still make a nice incremental
  improvement.
 
  Gwen
 
 
  On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com wrote:
   Yeah I'd be in favor of a quicker, smaller release but I think as
 long as
   we have these big things in flight we should probably keep the release
   criteria feature-based rather than time-based, though (e.g. when X
  works
   not every other month).
  
   Ideally the next release would have at least a beta version of the
 new
   consumer. I think having a new hunk of code like that available but
  marked
   as beta is maybe a good way to go, as it gets it into peoples hands
 for
   testing. This way we can declare the API not fully locked down until
 the
   final release too, since mostly users only look at stuff after we
 release
   it. Maybe we can try to construct a schedule around this?
  
   -Jay
  
  
   On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly
 wrote:
  
   There hasn't been any public discussion about the 0.8.3 release plan.
  
   There seems to be a lot of work in flight, work with patches and
 review
   that could/should get committed but now just pending KIPS, work
 without
  KIP
   but that is in trunk already (e.g. the new Consumer) that would be
 the
  the
   release but missing the KIP for the release...
  
   What does this mean for the 0.8.3 release? What are we trying to get
 out
   and when?
  
   Also looking at
  
 https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
   there
   seems to be things we are getting earlier (which is great of course)
 so
  are
   we going to try to up the version and go with 0.9.0?
  
   0.8.2.0 ended up getting very bloated and that delayed it much longer
  than
   we had originally communicated to the community and want to make
 sure we
   take that feedback from the community and try to improve upon it.
  
   Thanks!
  
   ~ Joe Stein
   - - - - - - - - - - - - - - - - -
  
 http://www.stealth.ly
   - - - - - - - - - - - - - - - - -
  
 



[jira] [Comment Edited] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Honghai Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358096#comment-14358096
 ] 

Honghai Chen edited comment on KAFKA-1646 at 3/12/15 4:48 AM:
--

Hey, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.


was (Author: waldenchen):
Het, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.

 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
Assignee: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
 KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Honghai Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358096#comment-14358096
 ] 

Honghai Chen commented on KAFKA-1646:
-

Het, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.

 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
Assignee: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
 KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [KIP-DISCUSSION] KIP-13 Quotas

2015-03-11 Thread Jay Kreps
1. Cool

2. Yeah I just wanted to flag the dependency/interaction.

3. Cool, I think we are in agreement then that a pluggable system could
possibly be nice but we can get to know it operationally before deciding to
expose such a thing.

4. Yeah, I agree, let's do it as a separate discussion. We actually had a
full discussion and vote back when we started down the path with metrics,
but I think there were some concerns so let's talk about it a bit more and
see.

5. Yeah I think my concern was just the resulting api. Basically because
the logic for each quota is different--at the very least a different metric
to check and different requests type to compute the value from, it seems
that the seemingly generic api just masks the fact that we handle each case
separately. I.e. the implementation of the method internally would be

  def check(request: T) {
if(request.instanceOf[ProduceRequest])
   [check produce request]
if(request.instanceOf[FetchRequest])
   [check fetch request]
..
  }

So basically we have logic specific to each request, but rather than
putting that logic into the method for handling that request we kind of put
it into a big case statement. So it seems like this doesn't really abstract
things since any time you add a new thing to quota you have to jump instead
the big case statement and add a new case, right? I think I may be
misunderstanding though...in any case not arguing that we want to just
shove this into the existing methods I just want to make sure if we
introduce an abstraction its a good one.

6. Yes, I think it is preferable not to have the seesaw effect in the delay
time. So if you need to impose 20 seconds of delay it is better to delay
all 200 requests 100 ms each rather than 199 requests 0 ms and one request
20 seconds. Several reasons for this:
a. gives predictable latency to the producer.
b. avoids hitting the request timeout on the one slow request
c. there is a trade-off between window size and delay time. If the window
is too small the estimate will be inaccurate and you will accidentally
penalize an okay client (e.g. imagine a 100 ms window, one big request
could overflow it). If the window is too large you will allow the system to
be brought to its knees for a long period of time prior to the throttling.

The other important question here is the details of the windowing policy.
If the window resets every 30 seconds, the client exhausts it in 10
seconds, then is throttled for 20, then it resets and the client starts
blitzing again. The result is basically 10 second outages every 30 seconds
as the throttling expires and the client goes full tilt, crushing the
server. So the quotas don't really do their job very well.

-Jay


On Mon, Mar 9, 2015 at 6:22 PM, Aditya Auradkar 
aaurad...@linkedin.com.invalid wrote:

 Thanks for the comments Jay and Todd. Firstly, I'd like to address some of
 Jay's points.

 1. You are right that we don't need to disable quotas on a per-client
 basis.

 2. Configuration management: I must admit, I haven't read Joe's proposal
 in detail. My thinking was that we can keep configuration management
 separate from this discussion since this is already quite a meaty topic.
 Let me spend some time reading that KIP and then I can add more detail to
 the quota KIP.

 3. Custom Quota implementations: I don't think it is necessarily a bad
 idea to have a interface called the QuotaManager(RequestThrottler). This
 doesn't necessarily mean exposing the interface as a public API. It is a
 mechanism to limit code changes to 1-2 specific classes. It prevents quota
 logic from bleeding into multiples places in the code as happens in any big
 piece of code. I fully agree that we should not expose this as a public API
 unless there is a very strong reason to. This seems to be more of an
 implementation detail.

 4. Metrics Package: I'll add a section on the wiki about using things from
 the metrics package. Currently, the quota stuff is located in
 clients/common/metrics. This means that we will have to migrate all that
 functionality into core. Do this also mean that we will need to replace the
 existing metrics code in core with the newly imported package as a part
 of this project? If so, that's a relatively large undertaking and it needs
 to be discussed separately IMO.

 5. Request Throttler vs QuotaManager -
 I wanted my quota manager to do something similar to what you proposed.
 Inside KafkaApis, I could do:

 if(quotaManager.check())
   // process request
 else
   return

 Internally QuotaManager:check() could do exactly what you suggested
 try {
  quotaMetric.record(newVal)
} catch (QuotaException e) {
 // logic to calculate delay
   requestThrottler.add(new DelayedResponse(...), ...)
  return
}

 This approach gives us the flexibility of deciding what metric we want to
 record inside QuotaManager. This brings us back to the same discussion of
 pluggable quota policies. It's a bit hard to articulate, but for 

[jira] [Updated] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-1646:
-
Reviewer:   (was: Jay Kreps)

 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
Assignee: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
 KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358125#comment-14358125
 ] 

Jay Kreps commented on KAFKA-1646:
--

Hey [~waldenchen] this patch is adding a TON of windows-specific if/else 
statements. I don't think that is sustainable. I think if we are going to do 
this we need to try to make it the same strategy across OS's just for 
maintainability.

That said, are you sure NTFS can't just be tuned to accomplish the same thing?

 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
Assignee: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
 KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
Yeah the real question is always what will we block on?

I don't think we should try to hold back smaller changes. In this bucket I
would include most things you described: mm improvements, replica
assignment tool improvements, flush, purgatory improvements, compression
optimization, etc. Likely these will all get done in time as well as many
things that kind of pop up from users but probably aren't worth doing a
release for on their own. If one of them slips that fine. I also don't
think we should try to hold back work that is done if it isn't on a list.

I would consider either SSL+SASL or the consumer worthy of a release on its
own. If they finish close to the same time that is great. We can maybe just
assess as these evolve where the other one is at and make a call whether it
will be one or both?

-Jay

On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira gshap...@cloudera.com wrote:

 If we are going in terms of features, I can see the following features
 getting in in the next month or two:

 * New consumer
 * Improved Mirror Maker (I've seen tons of interest)
 * Centralized admin requests (aka KIP-4)
 * Nicer replica-reassignment tool
 * SSL (and perhaps also SASL)?

 I think this collection will make a nice release. Perhaps we can cap
 it there and focus (as a community) on getting these in, we can have a
 release without too much scope creep in the not-very-distant-future?
 Even just 3 out of these 5 will still make a nice incremental
 improvement.

 Gwen


 On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps jay.kr...@gmail.com wrote:
  Yeah I'd be in favor of a quicker, smaller release but I think as long as
  we have these big things in flight we should probably keep the release
  criteria feature-based rather than time-based, though (e.g. when X
 works
  not every other month).
 
  Ideally the next release would have at least a beta version of the new
  consumer. I think having a new hunk of code like that available but
 marked
  as beta is maybe a good way to go, as it gets it into peoples hands for
  testing. This way we can declare the API not fully locked down until the
  final release too, since mostly users only look at stuff after we release
  it. Maybe we can try to construct a schedule around this?
 
  -Jay
 
 
  On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein joe.st...@stealth.ly wrote:
 
  There hasn't been any public discussion about the 0.8.3 release plan.
 
  There seems to be a lot of work in flight, work with patches and review
  that could/should get committed but now just pending KIPS, work without
 KIP
  but that is in trunk already (e.g. the new Consumer) that would be the
 the
  release but missing the KIP for the release...
 
  What does this mean for the 0.8.3 release? What are we trying to get out
  and when?
 
  Also looking at
  https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
  there
  seems to be things we are getting earlier (which is great of course) so
 are
  we going to try to up the version and go with 0.9.0?
 
  0.8.2.0 ended up getting very bloated and that delayed it much longer
 than
  we had originally communicated to the community and want to make sure we
  take that feedback from the community and try to improve upon it.
 
  Thanks!
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 



[jira] [Updated] (KAFKA-1930) Move server over to new metrics library

2015-03-11 Thread Aditya Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Auradkar updated KAFKA-1930:
---
Assignee: Aditya Auradkar

 Move server over to new metrics library
 ---

 Key: KAFKA-1930
 URL: https://issues.apache.org/jira/browse/KAFKA-1930
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps
Assignee: Aditya Auradkar

 We are using org.apache.kafka.common.metrics on the clients, but using Coda 
 Hale metrics on the server. We should move the server over to the new metrics 
 package as well. This will help to make all our metrics self-documenting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1930) Move server over to new metrics library

2015-03-11 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358077#comment-14358077
 ] 

Aditya Auradkar commented on KAFKA-1930:


I plan to work on this ticket since this has been called out as a pre-requisite 
for implementing quotas (KIP 13) in Kafka. I shall circulate a KIP once I 
understand the scope of the change well enough.

 Move server over to new metrics library
 ---

 Key: KAFKA-1930
 URL: https://issues.apache.org/jira/browse/KAFKA-1930
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps

 We are using org.apache.kafka.common.metrics on the clients, but using Coda 
 Hale metrics on the server. We should move the server over to the new metrics 
 package as well. This will help to make all our metrics self-documenting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [KIP-DISCUSSION] KIP-13 Quotas

2015-03-11 Thread Jay Kreps
Hey Todd,

Yeah it is kind of weird to do the quota check after taking a request, but
since the penalty is applied during that request and it just delays you to
the right rate, I think it isn't exactly wrong. I admit it is weird, though.

What you say about closing the connection makes sense. The issue is that
our current model for connections is totally transient. The clients are
supposed to handle any kind of transient connection loss and just
re-establish. So basically all existing clients would likely just retry all
the same whether you closed the connection or not, so at the moment there
would be no way to know a retried request is actually a retry.

Your point about the REST proxy is a good one, I don't think we had
considered that. Currently the java producer just has a single client.id
for all requests so the rest proxy would be a single client. But actually
what you want is the original sender to be the client. This is technically
very hard to do because the client will actually be batching records from
all senders together into one request so the only way to get the client id
right would be to make a new producer for each rest proxy client and this
would mean a lot of memory and connections. This needs thought, not sure
what solution there is.

I am not 100% convinced we need to obey the request timeout. The
configuration issue actually isn't a problem because the request timeout is
sent with the request so the broker actually knows it now even without a
handshake. However the question is, if someone sets a pathologically low
request timeout do we need to obey it? and if so won't that mean we can't
quota them? I claim the answer is no! I think we should redefine request
timeout to mean replication timeout, which is actually what it is today.
Even today if you interact with a slow server it may take longer than that
timeout (say because the fs write queues up for a long-ass time). I think
we need a separate client timeout which should be fairly long and unlikely
to be hit (default to 30 secs or something).

-Jay

On Tue, Mar 10, 2015 at 10:12 AM, Todd Palino tpal...@gmail.com wrote:

 Thanks, Jay. On the interface, I agree with Aditya (and you, I believe)
 that we don't need to expose the public API contract at this time, but
 structuring the internal logic to allow for it later with low cost is a
 good idea.

 Glad you explained the thoughts on where to hold requests. While my gut
 reaction is to not like processing a produce request that is over quota, it
 makes sense to do it that way if you are going to have your quota action be
 a delay.

 On the delay, I see your point on the bootstrap cases. However, one of the
 places I differ, and part of the reason that I prefer the error, is that I
 would never allow a producer who is over quota to resend a produce request.
 A producer should identify itself at the start of it's connection, and at
 that point if it is over quota, the broker would return an error and close
 the connection. The same goes for a consumer. I'm a fan, in general, of
 pushing all error cases and handling down to the client and doing as little
 special work to accommodate those cases on the broker side as possible.

 A case to consider here is what does this mean for REST endpoints to Kafka?
 Are you going to hold the HTTP connection open as well? Is the endpoint
 going to queue and hold requests?

 I think the point that we can only delay as long as the producer's timeout
 is a valid one, especially given that we do not have any means for the
 broker and client to negotiate settings, whether that is timeouts or
 message sizes or anything else. There are a lot of things that you have to
 know when setting up a Kafka client about what your settings should be,
 when much of that should be provided for in the protocol handshake. It's
 not as critical in an environment like ours, where we have central
 configuration for most clients, but we still see issues with it. I think
 being able to have the client and broker negotiate a minimum timeout
 allowed would make the delay more palatable.

 I'm still not sure it's the right solution, and that we're not just going
 with what's fast and cheap as opposed to what is good (or right). But given
 the details of where to hold the request, I have less of a concern with the
 burden on the broker.

 -Todd


 On Mon, Mar 9, 2015 at 5:01 PM, Jay Kreps jay.kr...@gmail.com wrote:

  Hey Todd,
 
  Nice points, let me try to respond:
 
  Plugins
 
  Yeah let me explain what I mean about plugins. The contracts that matter
  for us are public contracts, i.e. the protocol, the binary format, stuff
 in
  zk, and the various plug-in apis we expose. Adding an internal interface
  later will not be hard--the quota check is going to be done in 2-6 places
  in the code which would need to be updated, all internal to the broker.
 
  The challenge with making things pluggable up front is that the policy is
  usually fairly trivial to plug in but each policy 

[jira] [Updated] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2009:
-
Fix Version/s: 0.8.3

 Fix UncheckedOffset.removeOffset synchronization and trace logging issue in 
 mirror maker
 

 Key: KAFKA-2009
 URL: https://issues.apache.org/jira/browse/KAFKA-2009
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Fix For: 0.8.3

 Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch


 This ticket is to fix the mirror maker problem on current trunk which is 
 introduced in KAFKA-1650.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1914) Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1914:
-
Fix Version/s: 0.8.3

 Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics
 -

 Key: KAFKA-1914
 URL: https://issues.apache.org/jira/browse/KAFKA-1914
 Project: Kafka
  Issue Type: Sub-task
  Components: core
Reporter: Aditya A Auradkar
Assignee: Aditya Auradkar
 Fix For: 0.8.3

 Attachments: KAFKA-1914.patch, KAFKA-1914.patch, 
 KAFKA-1914_2015-02-17_15:46:27.patch


 Currently the BrokerTopicMetrics only counts the failedProduceRequestRate and 
 the failedFetchRequestRate. We should add 2 metrics to count the overall 
 produce/fetch request rates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1865) Add a flush() call to the new producer API

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1865:
-
Fix Version/s: 0.8.3

 Add a flush() call to the new producer API
 --

 Key: KAFKA-1865
 URL: https://issues.apache.org/jira/browse/KAFKA-1865
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8.3

 Attachments: KAFKA-1865.patch, KAFKA-1865_2015-02-21_15:36:54.patch, 
 KAFKA-1865_2015-02-22_16:26:46.patch, KAFKA-1865_2015-02-23_18:29:16.patch, 
 KAFKA-1865_2015-02-25_17:15:26.patch, KAFKA-1865_2015-02-26_10:37:16.patch


 The postconditions of this would be that any record enqueued prior to flush() 
 would have completed being sent (either successfully or not).
 An open question is whether you can continue sending new records while this 
 call is executing (on other threads).
 We should only do this if it doesn't add inefficiencies for people who don't 
 use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1852:
-
Fix Version/s: 0.8.3

 OffsetCommitRequest can commit offset on unknown topic
 --

 Key: KAFKA-1852
 URL: https://issues.apache.org/jira/browse/KAFKA-1852
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.3
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.3

 Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, 
 KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, 
 KAFKA-1852_2015-02-18_13:13:17.patch, KAFKA-1852_2015-02-27_13:50:34.patch


 Currently, we allow an offset to be committed to Kafka, even when the 
 topic/partition for the offset doesn't exist. We probably should disallow 
 that and send an error back in that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1910:
-
Fix Version/s: 0.8.3

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.3

 Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1831) Producer does not provide any information about which host the data was sent to

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1831:
-
Fix Version/s: 0.8.2.0

 Producer does not provide any information about which host the data was sent 
 to
 ---

 Key: KAFKA-1831
 URL: https://issues.apache.org/jira/browse/KAFKA-1831
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 0.8.1.1
Reporter: Mark Payne
Assignee: Jun Rao
 Fix For: 0.8.2.0


 For traceability purposes and for troubleshooting, when sending data to 
 Kafka, the Producer should provide information about which host the data was 
 sent to. This works well already in the SimpleConsumer, which provides host() 
 and port() methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76189
---


Hit this unit test failure, is this relevant?

--

kafka.consumer.ZookeeperConsumerConnectorTest  testConsumerRebalanceListener 
FAILED
junit.framework.AssertionFailedError: 
expected:List((0,group1_consumer1-0), (1,group1_consumer2-0)) but 
was:ArrayBuffer((1,group1_consumer2-0))
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:277)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:71)
at 
kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393)

- Guozhang Wang


On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31706/
 ---
 
 (Updated March 11, 2015, 10:20 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1997
 https://issues.apache.org/jira/browse/KAFKA-1997
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Guozhang's comments.
 
 
 Changed the exit behavior on send failure because close(0) is not ready yet. 
 Will submit followup patch after KAFKA-1660 is checked in.
 
 
 Expanded imports from _ and * to full class path
 
 
 Incorporated Joel's comments.
 
 
 Addressed Joel's comments.
 
 
 Addressed Guozhang's comments.
 
 
 Incorporated Guozhang's comments.
 
 
 Diffs
 -
 
   
 clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
 e6ff7683a0df4a7d221e949767e57c34703d5aad 
   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
 5487259751ebe19f137948249aa1fd2637d2deb4 
   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 bafa379ff57bc46458ea8409406f5046dc9c973e 
   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
   
 core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
 
 Diff: https://reviews.apache.org/r/31706/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




Re: Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

(Updated March 12, 2015, 1:48 a.m.)


Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description (updated)
---

PATCH for KAFKA-1546


Diffs (updated)
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357956#comment-14357956
 ] 

Aditya A Auradkar commented on KAFKA-1546:
--

Updated reviewboard https://reviews.apache.org/r/31967/diff/
 against branch origin/trunk

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya A Auradkar updated KAFKA-1546:
-
Attachment: KAFKA-1546_2015-03-11_18:48:09.patch

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 12, 2015, 2:10 a.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Expanded imports from _ and * to full class path


Incorporated Joel's comments.


Addressed Joel's comments.


Addressed Guozhang's comments.


Incorporated Guozhang's comments.


Fix a transient bug in ZookeeperConsumerConnectTest


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
bafa379ff57bc46458ea8409406f5046dc9c973e 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
19640cc55b5baa0a26a808d708b7f4caf491c9f0 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin



  1   2   >