Re: Kafka TLP website move
You need to tell us HOW to create it, either as a CMS site or using svnpubsub! From: Jay Kreps jay.kr...@gmail.com To: infrastruct...@apache.org infrastruct...@apache.org Cc: dev@kafka.apache.org Sent: Monday, December 10, 2012 12:30 PM Subject: Re: Kafka TLP website move Ooops, wrong ticket: https://issues.apache.org/jira/browse/INFRA-5586 :-) -Jay On Mon, Dec 10, 2012 at 9:29 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey guys, It's been a few weeks and we are still waiting on getting a top-level website url for Kafka. Tried just making it myself, but that didn't work: jkreps@minotaur:/www$ mkdir kafka.apache.org mkdir: kafka.apache.org: Permission denied Are we confused? Can anyone help? Here is the ticket: https://issues.apache.org/jira/browse/KAFKA-654 Thanks! -Jay
[jira] [Commented] (KAFKA-604) Add missing metrics in 0.8
[ https://issues.apache.org/jira/browse/KAFKA-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528116#comment-13528116 ] Yang Ye commented on KAFKA-604: --- Sure, I'll do that soon Best, --- Victor Yang Ye +1(650)283-6547 http://www.linkedin.com/pub/victor-yang-ye/13/ba3/4b8http://www.linkedin.com/profile/view?id=47740172 http://www.facebook.com/yeyangever Founder Uyan.cc Software Engineer in Distributed Data System Group, LinkedIn Corporation Dept. of Computer Science, Graduate School of Art and Scient, Columbia University Special Pilot Computer Science Class, Tsinghua University yeyange...@gmail.com y...@linkedin.com yy2...@columbia.edu Add missing metrics in 0.8 -- Key: KAFKA-604 URL: https://issues.apache.org/jira/browse/KAFKA-604 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8 Reporter: Jun Rao Attachments: kafka_604_v1.patch, kafka_604_v2.patch Original Estimate: 24h Remaining Estimate: 24h It would be good if we add the following metrics: Producer: droppedMessageRate per topic ReplicaManager: partition count on the broker FileMessageSet: logFlushTimer per log (i.e., partition). Also, logFlushTime should probably be moved to LogSegment since the flush now includes index flush time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (KAFKA-654) Irrecoverable error while trying to roll a segment that already exists
[ https://issues.apache.org/jira/browse/KAFKA-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-654. - Resolution: Fixed Assignee: Neha Narkhede Thanks for the review, committed patch v1 to 0.8 branch. Irrecoverable error while trying to roll a segment that already exists -- Key: KAFKA-654 URL: https://issues.apache.org/jira/browse/KAFKA-654 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Neha Narkhede Priority: Blocker Attachments: kafka-654-v1.patch I tried setting up a 5 broker 0.8 cluster and sending messages to 100s of topics on it. For a couple of topic partitions, the produce requests never succeed since they fail on the leader with the following error - [2012-12-05 22:54:05,711] WARN [Kafka Log on Broker 2], Newly rolled segment file 000 0.log already exists; deleting it first (kafka.log.Log) [2012-12-05 22:54:05,711] WARN [Kafka Log on Broker 2], Newly rolled segment file 000 0.index already exists; deleting it first (kafka.log.Log) [2012-12-05 22:54:05,715] ERROR [ReplicaFetcherThread-1-0-on-broker-2], Error due to (kafka.server.R eplicaFetcherThread) kafka.common.KafkaException: Trying to roll a new log segment for topic partition NusWriteEvent-4 with start offset 0 while it already exsits at kafka.log.Log.rollToOffset(Log.scala:456) at kafka.log.Log.roll(Log.scala:434) at kafka.log.Log.maybeRoll(Log.scala:423) at kafka.log.Log.append(Log.scala:257) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:51) at kafka.server.AbstractFetcherThread$$anonfun$doWork$5.apply(AbstractFetcherThread.scala:125) at kafka.server.AbstractFetcherThread$$anonfun$doWork$5.apply(AbstractFetcherThread.scala:108) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:125) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:344) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:344) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:108) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Kafka TLP website move
Cool, makes sense. Let's go with SVN and svnpubsub then. The site subdirectory that we would published is https://svn.apache.org/repos/asf/kafka/site In the future if we switch to git we will just leave the site in svn and continue to use that for site updates. -Jay On Mon, Dec 10, 2012 at 10:15 AM, Ted Dunning ted.dunn...@gmail.com wrote: We have an SVN repo for the web site and a git repo for code. On Mon, Dec 10, 2012 at 10:13 AM, Jay Kreps jay.kr...@gmail.com wrote: I am confused. The CMS documentation says this: * * *Instead of developing versioning support and a notification scheme into a database driven CMS, Apache's subversion infrastructurehttp://svn.apache.org/was chosen as the central data store for everything. The fact that the web interface to the CMS interacts with the subversion repository in a LAN environment, combined with the lightning-fast SSDs that serve as l2arc cache for the underlying FreeBSD ZFS filesystem, eliminates virtually all subversion network/disk latency. Subversion continues to scale past 1M commits to deliver high performance to Apache developers, as well as to our internal programs that rely on it.* How do the ASF's git projects maintain their websites if both website options require SVN? Should we just have a separate repository for the site that is in SVN (currently they are in the same repository)? Basically the project voted to move to git, so I don't want to make any choices that block that. -Jay On Mon, Dec 10, 2012 at 9:53 AM, Joe Schaefer joe_schae...@yahoo.comwrote: No, you may not stick with a manual process. If the CMS doesn't suit you (there is no requirement to use markdown- other cms sites use html), you must use svnpubsub. There is no gitpubsub and there are no plans to write one. -- *From:* Jay Kreps jay.kr...@gmail.com *To:* Joe Schaefer joe_schae...@yahoo.com *Sent:* Monday, December 10, 2012 12:50 PM *Subject:* Re: Kafka TLP website move The CMS sounds like it requires some kind of markdown format. Our site is in HTML, so that won't work. svnpubsub sounds like it requires svn. We are trying to move to git, so that probably isn't good either. Is it possible to stick with the manual update process we had for the incubator site? Thanks! -Jay On Mon, Dec 10, 2012 at 9:32 AM, Joe Schaefer joe_schae...@yahoo.comwrote: MS site or using svnpubsub
Async segment delete patch
Hello fellow log maintainers, I have a patch and would love your feedback: https://issues.apache.org/jira/browse/KAFKA-636 There will be a few more in this series as I finish off the log compaction work. This patch is against trunk, though we may end up needing to backport it if we hit delete-related issues in 0.8. Cheers, -Jay
Re: Kafka TLP website move
Our experience with CMS has been good, btw. Consider as an option. You get a very simple mark-down based web site with browser editing if you like it. Works very well. On Mon, Dec 10, 2012 at 10:26 AM, Jay Kreps jay.kr...@gmail.com wrote: Cool, makes sense. Let's go with SVN and svnpubsub then. The site subdirectory that we would published is https://svn.apache.org/repos/asf/kafka/site In the future if we switch to git we will just leave the site in svn and continue to use that for site updates. -Jay On Mon, Dec 10, 2012 at 10:15 AM, Ted Dunning ted.dunn...@gmail.comwrote: We have an SVN repo for the web site and a git repo for code. On Mon, Dec 10, 2012 at 10:13 AM, Jay Kreps jay.kr...@gmail.com wrote: I am confused. The CMS documentation says this: * * *Instead of developing versioning support and a notification scheme into a database driven CMS, Apache's subversion infrastructurehttp://svn.apache.org/was chosen as the central data store for everything. The fact that the web interface to the CMS interacts with the subversion repository in a LAN environment, combined with the lightning-fast SSDs that serve as l2arc cache for the underlying FreeBSD ZFS filesystem, eliminates virtually all subversion network/disk latency. Subversion continues to scale past 1M commits to deliver high performance to Apache developers, as well as to our internal programs that rely on it.* How do the ASF's git projects maintain their websites if both website options require SVN? Should we just have a separate repository for the site that is in SVN (currently they are in the same repository)? Basically the project voted to move to git, so I don't want to make any choices that block that. -Jay On Mon, Dec 10, 2012 at 9:53 AM, Joe Schaefer joe_schae...@yahoo.comwrote: No, you may not stick with a manual process. If the CMS doesn't suit you (there is no requirement to use markdown- other cms sites use html), you must use svnpubsub. There is no gitpubsub and there are no plans to write one. -- *From:* Jay Kreps jay.kr...@gmail.com *To:* Joe Schaefer joe_schae...@yahoo.com *Sent:* Monday, December 10, 2012 12:50 PM *Subject:* Re: Kafka TLP website move The CMS sounds like it requires some kind of markdown format. Our site is in HTML, so that won't work. svnpubsub sounds like it requires svn. We are trying to move to git, so that probably isn't good either. Is it possible to stick with the manual update process we had for the incubator site? Thanks! -Jay On Mon, Dec 10, 2012 at 9:32 AM, Joe Schaefer joe_schae...@yahoo.comwrote: MS site or using svnpubsub
[jira] Subscription: outstanding kafka patches
Issue Subscription Filter: outstanding kafka patches (57 issues) The list of outstanding kafka patches Subscriber: kafka-mailing-list Key Summary KAFKA-664 Kafka server threads die due to OOME during long running test https://issues.apache.org/jira/browse/KAFKA-664 KAFKA-651 Create testcases on auto create topics https://issues.apache.org/jira/browse/KAFKA-651 KAFKA-646 Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level https://issues.apache.org/jira/browse/KAFKA-646 KAFKA-645 Create a shell script to run System Test with DEBUG details and tee console output to a file https://issues.apache.org/jira/browse/KAFKA-645 KAFKA-637 Separate log4j environment variable from KAFKA_OPTS in kafka-run-class.sh https://issues.apache.org/jira/browse/KAFKA-637 KAFKA-636 Make log segment delete asynchronous https://issues.apache.org/jira/browse/KAFKA-636 KAFKA-628 System Test Failure Case 5005 (Mirror Maker bouncing) - Data Loss in ConsoleConsumer https://issues.apache.org/jira/browse/KAFKA-628 KAFKA-621 System Test 9051 : ConsoleConsumer doesn't receives any data for 20 topics but works for 10 https://issues.apache.org/jira/browse/KAFKA-621 KAFKA-607 System Test Transient Failure (case 4011 Log Retention) - ConsoleConsumer receives less data https://issues.apache.org/jira/browse/KAFKA-607 KAFKA-606 System Test Transient Failure (case 0302 GC Pause) - Log segments mismatched across replicas https://issues.apache.org/jira/browse/KAFKA-606 KAFKA-604 Add missing metrics in 0.8 https://issues.apache.org/jira/browse/KAFKA-604 KAFKA-598 decouple fetch size from max message size https://issues.apache.org/jira/browse/KAFKA-598 KAFKA-597 Refactor KafkaScheduler https://issues.apache.org/jira/browse/KAFKA-597 KAFKA-583 SimpleConsumerShell may receive less data inconsistently https://issues.apache.org/jira/browse/KAFKA-583 KAFKA-552 No error messages logged for those failing-to-send messages from Producer https://issues.apache.org/jira/browse/KAFKA-552 KAFKA-547 The ConsumerStats MBean name should include the groupid https://issues.apache.org/jira/browse/KAFKA-547 KAFKA-530 kafka.server.KafkaApis: kafka.common.OffsetOutOfRangeException https://issues.apache.org/jira/browse/KAFKA-530 KAFKA-493 High CPU usage on inactive server https://issues.apache.org/jira/browse/KAFKA-493 KAFKA-479 ZK EPoll taking 100% CPU usage with Kafka Client https://issues.apache.org/jira/browse/KAFKA-479 KAFKA-465 Performance test scripts - refactoring leftovers from tools to perf package https://issues.apache.org/jira/browse/KAFKA-465 KAFKA-438 Code cleanup in MessageTest https://issues.apache.org/jira/browse/KAFKA-438 KAFKA-419 Updated PHP client library to support kafka 0.7+ https://issues.apache.org/jira/browse/KAFKA-419 KAFKA-414 Evaluate mmap-based writes for Log implementation https://issues.apache.org/jira/browse/KAFKA-414 KAFKA-411 Message Error in high cocurrent environment https://issues.apache.org/jira/browse/KAFKA-411 KAFKA-404 When using chroot path, create chroot on startup if it doesn't exist https://issues.apache.org/jira/browse/KAFKA-404 KAFKA-399 0.7.1 seems to show less performance than 0.7.0 https://issues.apache.org/jira/browse/KAFKA-399 KAFKA-398 Enhance SocketServer to Enable Sending Requests https://issues.apache.org/jira/browse/KAFKA-398 KAFKA-397 kafka.common.InvalidMessageSizeException: null https://issues.apache.org/jira/browse/KAFKA-397 KAFKA-388 Add a highly available consumer co-ordinator to a Kafka cluster https://issues.apache.org/jira/browse/KAFKA-388 KAFKA-374 Move to java CRC32 implementation https://issues.apache.org/jira/browse/KAFKA-374 KAFKA-346 Don't call commitOffsets() during rebalance https://issues.apache.org/jira/browse/KAFKA-346 KAFKA-345 Add a listener to ZookeeperConsumerConnector to get notified on rebalance events https://issues.apache.org/jira/browse/KAFKA-345 KAFKA-319 compression support added to php client does not pass unit tests https://issues.apache.org/jira/browse/KAFKA-319 KAFKA-318 update zookeeper dependency to 3.3.5 https://issues.apache.org/jira/browse/KAFKA-318 KAFKA-314 Go Client Multi-produce https://issues.apache.org/jira/browse/KAFKA-314 KAFKA-313 Add JSON output and looping options to ConsumerOffsetChecker https://issues.apache.org/jira/browse/KAFKA-313 KAFKA-312 Add 'reset' operation for AsyncProducerDroppedEvents https://issues.apache.org/jira/browse/KAFKA-312 KAFKA-298 Go Client support max message size
[jira] [Commented] (KAFKA-646) Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[ https://issues.apache.org/jira/browse/KAFKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528183#comment-13528183 ] Neha Narkhede commented on KAFKA-646: - Patch v2 looks good to me. Few minor questions - 1. Producer You probably don't validate the client id anymore in the secondary constructor. Shouldn't we do that ? 2. ZookeeperConsumerConnector consumerTopicStats is unused 3. Do the singleton validate() APIs need to be synchronized ? Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level --- Key: KAFKA-646 URL: https://issues.apache.org/jira/browse/KAFKA-646 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-646-patch-num1-v1.patch, kafka-646-patch-num1-v2.patch WIth KAFKA-622, we measure ProducerRequestStats and FetchRequestAndResponseStats at the SyncProducer and SimpleConsumer level respectively. We could also aggregate them in the high level Producer and ZookeeperConsumerConnector level to provide an overall sense of request/response rate/size at the client level. Currently, I am not completely clear about the math that might be necessary for such aggregation or if metrics already provides an API for aggregating stats of the same type. We should also address the comments by Jun at KAFKA-622, I am copy pasting them here: 60. What happens if have 2 instances of Consumers with the same clientid in the same jvm? Does one of them fail because it fails to register metrics? Ditto for Producers. 61. ConsumerTopicStats: What if a topic is named AllTopics? We use to handle this by adding a - in topic specific stats. 62. ZookeeperConsumerConnector: Do we need to validate groupid? 63. ClientId: Does the clientid length need to be different from topic length? 64. AbstractFetcherThread: When building a fetch request, do we need to pass in brokerInfo as part of the client id? BrokerInfo contains the source broker info and the fetch requests are always made to the source broker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-646) Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[ https://issues.apache.org/jira/browse/KAFKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Ghike updated KAFKA-646: Attachment: kafka-646-patch-num1-v3.patch Thanks for reviewing. Patch v3: 1. Oh, that's because clientId is validated at the end of ProducerConfig constructor. 2. Removed it. 3. Currently the validate() APIs only check for illegal chars and they don't yet check whether the incoming clientId has already been taken. (I am planning to do it in a separate patch in the same jira, after this patch has been checked in). Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level --- Key: KAFKA-646 URL: https://issues.apache.org/jira/browse/KAFKA-646 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-646-patch-num1-v1.patch, kafka-646-patch-num1-v2.patch, kafka-646-patch-num1-v3.patch WIth KAFKA-622, we measure ProducerRequestStats and FetchRequestAndResponseStats at the SyncProducer and SimpleConsumer level respectively. We could also aggregate them in the high level Producer and ZookeeperConsumerConnector level to provide an overall sense of request/response rate/size at the client level. Currently, I am not completely clear about the math that might be necessary for such aggregation or if metrics already provides an API for aggregating stats of the same type. We should also address the comments by Jun at KAFKA-622, I am copy pasting them here: 60. What happens if have 2 instances of Consumers with the same clientid in the same jvm? Does one of them fail because it fails to register metrics? Ditto for Producers. 61. ConsumerTopicStats: What if a topic is named AllTopics? We use to handle this by adding a - in topic specific stats. 62. ZookeeperConsumerConnector: Do we need to validate groupid? 63. ClientId: Does the clientid length need to be different from topic length? 64. AbstractFetcherThread: When building a fetch request, do we need to pass in brokerInfo as part of the client id? BrokerInfo contains the source broker info and the fetch requests are always made to the source broker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-647) Provide a property in System Test for no. of topics and topics string will be generated automatically
[ https://issues.apache.org/jira/browse/KAFKA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-647: Attachment: kafka-647-v1.patch Uploaded kafka-647-v1.patch for the following changes: 1. To let System Test to generate the topic string with a specified no. of topics, add the testcase argument num_topics_for_auto_generated_string to system_test/_testsuite/testcase_/testcase__properties.json such as: testcase_args: { broker_type: leader, bounce_broker: false, . . . num_topics_for_auto_generated_string: 20, . . . }, 2. The topics string generated would be in the format of: topic_0001,topic_0002,topic_0003, . . .,topic_. 3. The topic prefix is hard-coded to be topic_. As long as the topics index are incremented by 1, it doesn't matter too much about the prefix. 4. The existing topics specification in other test cases are still supported. This multi topics string will only be generated if the testcase argument in #1 is specified Provide a property in System Test for no. of topics and topics string will be generated automatically - Key: KAFKA-647 URL: https://issues.apache.org/jira/browse/KAFKA-647 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing Attachments: kafka-647-v1.patch Currently the topics string is specified in the testcase__properties.json file such as: testcase_9051_properties.json: topic: t001,t002,t003,t004,t005,t006,t007,t008,t009,t010,t011,t012,t013,t014,t015,t016,t017,t018,t019,t020, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-647) Provide a property in System Test for no. of topics and topics string will be generated automatically
[ https://issues.apache.org/jira/browse/KAFKA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Fung updated KAFKA-647: Status: Patch Available (was: Open) Provide a property in System Test for no. of topics and topics string will be generated automatically - Key: KAFKA-647 URL: https://issues.apache.org/jira/browse/KAFKA-647 Project: Kafka Issue Type: Task Reporter: John Fung Assignee: John Fung Labels: replication-testing Attachments: kafka-647-v1.patch Currently the topics string is specified in the testcase__properties.json file such as: testcase_9051_properties.json: topic: t001,t002,t003,t004,t005,t006,t007,t008,t009,t010,t011,t012,t013,t014,t015,t016,t017,t018,t019,t020, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-646) Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[ https://issues.apache.org/jira/browse/KAFKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Ghike updated KAFKA-646: Attachment: kafka-646-patch-num1-v4.patch Fixed a typo in FetchRequestAndResponseStats mbean creation. Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level --- Key: KAFKA-646 URL: https://issues.apache.org/jira/browse/KAFKA-646 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-646-patch-num1-v1.patch, kafka-646-patch-num1-v2.patch, kafka-646-patch-num1-v3.patch, kafka-646-patch-num1-v4.patch WIth KAFKA-622, we measure ProducerRequestStats and FetchRequestAndResponseStats at the SyncProducer and SimpleConsumer level respectively. We could also aggregate them in the high level Producer and ZookeeperConsumerConnector level to provide an overall sense of request/response rate/size at the client level. Currently, I am not completely clear about the math that might be necessary for such aggregation or if metrics already provides an API for aggregating stats of the same type. We should also address the comments by Jun at KAFKA-622, I am copy pasting them here: 60. What happens if have 2 instances of Consumers with the same clientid in the same jvm? Does one of them fail because it fails to register metrics? Ditto for Producers. 61. ConsumerTopicStats: What if a topic is named AllTopics? We use to handle this by adding a - in topic specific stats. 62. ZookeeperConsumerConnector: Do we need to validate groupid? 63. ClientId: Does the clientid length need to be different from topic length? 64. AbstractFetcherThread: When building a fetch request, do we need to pass in brokerInfo as part of the client id? BrokerInfo contains the source broker info and the fetch requests are always made to the source broker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-667) Rename .highwatermark file
Jay Kreps created KAFKA-667: --- Summary: Rename .highwatermark file Key: KAFKA-667 URL: https://issues.apache.org/jira/browse/KAFKA-667 Project: Kafka Issue Type: Improvement Affects Versions: 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Priority: Minor The 0.8 branch currently has a file in each log directory called .highwatermark Soon we hope to add two more files in the same format. One will hold the cleaner position for log deduplication, and the other will hold the flusher position for log flush. Each of these is sort of a highwater mark. It would be good to rename .highwatermark to be a little bit more intuitive when we add these other files. I propose: replication-offset-checkpoint flusher-offset-checkpoint cleaner-offset-checkpoint replication-offset-checkpoint would replace the .highwatermark file. I am not making them dot files since they represent an important part of the persistent state and so the user should see it. Also shell * doesn't match hidden files, so if you did something like cp my_log/* to my_backup_log/* you would not get corresponding .highwatermark file. I am filing this bug now because it might be nice to just make this trivial change now and avoid having to handle backwards compatibility later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Objections to doing this in 0.8?
https://issues.apache.org/jira/browse/KAFKA-667 Goal would just be forward compatibility with a more sane naming scheme... -Jay
Re: Objections to doing this in 0.8?
Prefer doing this now than later. Thanks, Neha On Mon, Dec 10, 2012 at 3:05 PM, Jay Kreps jay.kr...@gmail.com wrote: https://issues.apache.org/jira/browse/KAFKA-667 Goal would just be forward compatibility with a more sane naming scheme... -Jay
[jira] [Commented] (KAFKA-513) Add state change log to Kafka brokers
[ https://issues.apache.org/jira/browse/KAFKA-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528432#comment-13528432 ] Jay Kreps commented on KAFKA-513: - It would be nice if we updated our log4j.properties as part of this ticket so that this log went to a different log file (and not to console) since this is meant for debugging and will confuse everyone except for Neha :-). Would probably make it easier to read the state transitions too... Add state change log to Kafka brokers - Key: KAFKA-513 URL: https://issues.apache.org/jira/browse/KAFKA-513 Project: Kafka Issue Type: Sub-task Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Swapnil Ghike Priority: Blocker Labels: replication, tools Fix For: 0.8 Original Estimate: 96h Remaining Estimate: 96h Once KAFKA-499 is checked in, every controller to broker communication can be modelled as a state change for one or more partitions. Every state change request will carry the controller epoch. If there is a problem with the state of some partitions, it will be good to have a tool that can create a timeline of requested and completed state changes. This will require each broker to output a state change log that has entries like [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1 [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() for partition [foo, 0] from controller 2, epoch 1 On controller, this will look like - [2012-09-10 10:06:17,198] controller 2, epoch 1, initiated state change request LeaderAndIsr() for partition [foo, 0] We need a tool that can collect the state change log from all brokers and create a per-partition timeline of state changes - [foo, 0] [2012-09-10 10:06:17,198] controller 2, epoch 1 initiated state change request LeaderAndIsr() [2012-09-10 10:06:17,280] broker 1 received request LeaderAndIsr() from controller 2, epoch 1 [2012-09-10 10:06:17,350] broker 1 completed request LeaderAndIsr() from controller 2, epoch 1 This JIRA involves adding the state change log to each broker and adding the tool to create the timeline -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-646) Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level
[ https://issues.apache.org/jira/browse/KAFKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528559#comment-13528559 ] Jun Rao commented on KAFKA-646: --- Thanks for patch v4. Looks good overall. A few minor comments: 40. GroupId,ClientId: The validation code is identical. Could we combine them into one utility? We can throw a generic InvalidConfigurationException with the right text. 41. The patch does apply because of changes in system_test/testcase_to_run.json. Do you actually intend to change this file? Provide aggregate stats at the high level Producer and ZookeeperConsumerConnector level --- Key: KAFKA-646 URL: https://issues.apache.org/jira/browse/KAFKA-646 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-646-patch-num1-v1.patch, kafka-646-patch-num1-v2.patch, kafka-646-patch-num1-v3.patch, kafka-646-patch-num1-v4.patch WIth KAFKA-622, we measure ProducerRequestStats and FetchRequestAndResponseStats at the SyncProducer and SimpleConsumer level respectively. We could also aggregate them in the high level Producer and ZookeeperConsumerConnector level to provide an overall sense of request/response rate/size at the client level. Currently, I am not completely clear about the math that might be necessary for such aggregation or if metrics already provides an API for aggregating stats of the same type. We should also address the comments by Jun at KAFKA-622, I am copy pasting them here: 60. What happens if have 2 instances of Consumers with the same clientid in the same jvm? Does one of them fail because it fails to register metrics? Ditto for Producers. 61. ConsumerTopicStats: What if a topic is named AllTopics? We use to handle this by adding a - in topic specific stats. 62. ZookeeperConsumerConnector: Do we need to validate groupid? 63. ClientId: Does the clientid length need to be different from topic length? 64. AbstractFetcherThread: When building a fetch request, do we need to pass in brokerInfo as part of the client id? BrokerInfo contains the source broker info and the fetch requests are always made to the source broker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Objections to doing this in 0.8?
That sounds goods to me. Thanks, Jun On Mon, Dec 10, 2012 at 3:05 PM, Jay Kreps jay.kr...@gmail.com wrote: https://issues.apache.org/jira/browse/KAFKA-667 Goal would just be forward compatibility with a more sane naming scheme... -Jay
[jira] [Commented] (KAFKA-581) provides windows batch script for starting Kafka/Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528688#comment-13528688 ] Jun Rao commented on KAFKA-581: --- Thanks for the patch. Committed the stop scripts to 0.8. provides windows batch script for starting Kafka/Zookeeper -- Key: KAFKA-581 URL: https://issues.apache.org/jira/browse/KAFKA-581 Project: Kafka Issue Type: Improvement Components: config Affects Versions: 0.8 Environment: Windows Reporter: antoine vianey Priority: Trivial Labels: features, run, windows Fix For: 0.8 Attachments: kafka-console-consumer.bat, kafka-console-producer.bat, kafka-run-class.bat, kafka-server-start.bat, kafka-server-stop.bat, sbt.bat, zookeeper-server-start.bat, zookeeper-server-stop.bat Original Estimate: 24h Remaining Estimate: 24h Provide a port for quickstarting Kafka dev on Windows : - kafka-run-class.bat - kafka-server-start.bat - zookeeper-server-start.bat This will help Kafka community growth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-374) Move to java CRC32 implementation
[ https://issues.apache.org/jira/browse/KAFKA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528690#comment-13528690 ] Jun Rao commented on KAFKA-374: --- Should this be a post 0.8 item? Move to java CRC32 implementation - Key: KAFKA-374 URL: https://issues.apache.org/jira/browse/KAFKA-374 Project: Kafka Issue Type: New Feature Components: core Affects Versions: 0.8 Reporter: Jay Kreps Priority: Minor Labels: newbie Attachments: KAFKA-374-draft.patch, KAFKA-374.patch We keep a per-record crc32. This is fairly cheap algorithm, but the java implementation uses JNI and it seems to be a bit expensive for small records. I have seen this before in Kafka profiles, and I noticed it on another application I was working on. Basically with small records the native implementation can only checksum 100MB/sec. Hadoop has done some analysis of this and replaced it with a Java implementation that is 2x faster for large values and 5-10x faster for small values. Details are here HADOOP-6148. We should do a quick read/write benchmark on log and message set iteration and see if this improves things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528707#comment-13528707 ] Jun Rao commented on KAFKA-664: --- If the problem is due to an expired request not being removed from the request LinkedList in the watcher, then there should be at most 1 such outstanding request per topic/partition. So, if the number of topic/partition is fixed, the memory space taken by those outstanding requests should be bounded too, right? Not sure why this causes memory usage to keep going up. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-664-draft-2.patch, kafka-664-draft.patch, Screen Shot 2012-12-09 at 11.22.50 AM.png, Screen Shot 2012-12-09 at 11.23.09 AM.png, Screen Shot 2012-12-09 at 11.31.29 AM.png, thread-dump.log, watchersForKey.png I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528723#comment-13528723 ] Jun Rao commented on KAFKA-664: --- Got it. So the issue is that for low volume topics, the fetch requests made by the followers keep got timed out. Those timeouted requests won't be removed from the request LinkedList in the watcher until the next produce request for that topic comes, which could be a long time. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-664-draft-2.patch, kafka-664-draft.patch, Screen Shot 2012-12-09 at 11.22.50 AM.png, Screen Shot 2012-12-09 at 11.23.09 AM.png, Screen Shot 2012-12-09 at 11.31.29 AM.png, thread-dump.log, watchersForKey.png I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (KAFKA-664) Kafka server threads die due to OOME during long running test
[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528733#comment-13528733 ] Neha Narkhede edited comment on KAFKA-664 at 12/11/12 6:45 AM: --- That's correct. I'm tempted to checkin v2 for now and wait for the purgatory refactor patch. Until then, we can probably keep the JIRA open. Thoughts ? was (Author: nehanarkhede): That's correct. Kafka server threads die due to OOME during long running test - Key: KAFKA-664 URL: https://issues.apache.org/jira/browse/KAFKA-664 Project: Kafka Issue Type: Bug Affects Versions: 0.8 Reporter: Neha Narkhede Assignee: Jay Kreps Priority: Blocker Labels: bugs Fix For: 0.8 Attachments: kafka-664-draft-2.patch, kafka-664-draft.patch, Screen Shot 2012-12-09 at 11.22.50 AM.png, Screen Shot 2012-12-09 at 11.23.09 AM.png, Screen Shot 2012-12-09 at 11.31.29 AM.png, thread-dump.log, watchersForKey.png I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long running producer process that sends data to 100s of partitions continuously for ~15 hours. After ~4 hours of operation, few server threads (acceptor and processor) exited due to OOME - [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-acceptor': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread 'kafka-processor-9092-1': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, session 0x13afd0753870103 has expired, closing socket connection (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) (org.I0Itec.zkclient.ZkClient) [2012-12-07 08:24:46,344] INFO Initiating client connection, connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 (org.apache.zookeeper.ZooKeeper) [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread 'kafka-request-handler-0': (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:08,739] INFO Opening socket connection to server eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:14,221] INFO Socket connection established to eat1-app311.corp/172.20.72.75:12913, initiating session (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from server in 3722ms for sessionid 0x0, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) java.lang.OutOfMemoryError: Java heap space [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 (kafka.network.BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space It seems like it runs out of memory while trying to read the producer request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira