Re: Review Request 26663: Patch for KAFKA-979
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56494 --- core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/26663/#comment96865 Although this should work I'm wondering if it would be better to recompute a random jitter for each new rolled segment. What do you think? core/src/main/scala/kafka/log/LogConfig.scala https://reviews.apache.org/r/26663/#comment96864 We should use Utils.abs before returning or it could return a large negative result. - Joel Koshy On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 13, 2014, 11:16 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- KAFKA-979 Add optional random jitter for time based log rolling. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
Re: [DISCUSSION] Message Metadata
I think we could add schemaId(binary) to the MessageAndMetaData With the schemaId you can implement different downstream software pattern on the messages reliably. I wrote up more thoughts on this use https://cwiki.apache.org/confluence/display/KAFKA/Schema+based+topics it should strive to encompass all implementation needs for producer, broker, consumer hooks. So if the application and tagged fields are important you can package that into a specific Kafka topic plug-in and assign it to topic(s). Kafka server should be able to validate your expected formats (like encoders/decoders but in broker by topic regardless of producer) to the topics that have it enabled. We should have these maintained in the project under contrib. =- Joestein On Mon, Oct 13, 2014 at 11:02 PM, Guozhang Wang wangg...@gmail.com wrote: Hi Jay, Thanks for the comments. Replied inline. Guozhang On Mon, Oct 13, 2014 at 11:11 AM, Jay Kreps jay.kr...@gmail.com wrote: I need to take more time to think about this. Here are a few off-the-cuff remarks: - To date we have tried really, really hard to keep the data model for message simple since after all you can always add whatever you like inside the message body. - For system tags, why not just make these fields first class fields in message? The purpose of a system tag is presumably that Why have a bunch of key-value pairs versus first-class fields? Yes, we can alternatively make system tags as first class fields in the message header to make the format / processing logic simpler. The main reasons I put them as systems tags are 1) when I think about these possible system tags, some of them are for all types of messages (e.g. timestamps), but some of them may be for a specific type of message (compressed, control message) and for those not all of them are necessarily required all the time, hence making them as compact tags may save us some space when not all of them are available; 2) with tags we do not need to bump up the protocol version every time we make a change to it, which includes keeping the logic to handle all versions on the broker until the old ones are officially discarded; instead, the broker can just ignore a tag if its id is not recognizable since the client is on a newer version, or use some default value / throw exception if a required tag is missing since the client is on an older version. - You don't necessarily need application-level tags explicitly represented in the message format for efficiency. The application can define their own header (e.g. their message could be a size delimited header followed by a size delimited body). But actually if you use Avro you don't even need this I don't think. Avro has the ability to just deserialize the header fields in your message. Avro has a notion of reader and writer schemas. The writer schema is whatever the message was written with. If the reader schema is just the header, avro will skip any fields it doesn't need and just deserialize the fields it does need. This is actually a much more usable and flexible way to define a header since you get all the types avro allows instead of just bytes. I agree that we can use a reader schema to just read out the header without de-serializing the full message, and probably for compressed message we can add an Avro / etc header for the compressed wrapper message also, but that would enforce these applications (MM, auditor, clients) to be schema-aware, which would usually require the people who manage this data pipeline also manage the schemas, whereas ideally Kafka itself should just consider bytes-in and bytes-out (and maybe a little bit more, like timestamps). The purpose here is to not introduce an extra dependency while at the same time allow applications to not fully de-serialize / de-compress the message in order to do some simple processing based on metadata only. - We will need to think carefully about what to do with timestamps if we end up including them. There are actually several timestamps - The time the producer created the message - The time the leader received the message - The time the current broker received the message The producer timestamps won't be at all increasing. The leader timestamp will be mostly increasing except when the clock changes or leadership moves. This somewhat complicates the use of these timestamps, though. From the point of view of the producer the only time that matters is the time the message was created. However since the producer sets this it can be arbitrarily bad (remember all the ntp issues and 1970 timestamps we would get). Say that the heuristic was to use the timestamp of the first message in a file for retention, the problem would be that the timestamps for the segments need not even be sequential and a single bad producer could send data with time in the distant past or future causing data to be
[jira] [Updated] (KAFKA-1691) new java consumer needs ssl support as a client
[ https://issues.apache.org/jira/browse/KAFKA-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1691: - Component/s: security new java consumer needs ssl support as a client --- Key: KAFKA-1691 URL: https://issues.apache.org/jira/browse/KAFKA-1691 Project: Kafka Issue Type: Sub-task Components: security Reporter: Joe Stein Assignee: Ivan Lyutov Fix For: 0.8.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1682) Security for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1682: - Component/s: security Security for Kafka -- Key: KAFKA-1682 URL: https://issues.apache.org/jira/browse/KAFKA-1682 Project: Kafka Issue Type: New Feature Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Parent ticket for security. Wiki and discussion is here: https://cwiki.apache.org/confluence/display/KAFKA/Security -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1685) Implement TLS/SSL tests
[ https://issues.apache.org/jira/browse/KAFKA-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1685: - Component/s: security Implement TLS/SSL tests --- Key: KAFKA-1685 URL: https://issues.apache.org/jira/browse/KAFKA-1685 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Fix For: 0.9.0 We need to write a suite of unit tests for TLS authentication. This should be doable with a junit integration test. We can use the simple authorization plugin with only a single user whitelisted. The test can start the server and then connects with and without TLS and validates that access is only possible when authenticated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication
[ https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1684: - Component/s: security Implement TLS/SSL authentication Key: KAFKA-1684 URL: https://issues.apache.org/jira/browse/KAFKA-1684 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Add an SSL port to the configuration and advertise this as part of the metadata request. If the SSL port is configured the socket server will need to add a second Acceptor thread to listen on it. Connections accepted on this port will need to go through the SSL handshake prior to being registered with a Processor for request processing. SSL requests and responses may need to be wrapped or unwrapped using the SSLEngine that was initialized by the acceptor. This wrapping and unwrapping is very similar to what will need to be done for SASL-based authentication schemes. We should have a uniform interface that covers both of these and we will need to store the instance in the session with the request. The socket server will have to use this object when reading and writing requests. We will need to take care with the FetchRequests as the current FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we can only use this optimization for unencrypted sockets that don't require userspace translation (wrapping). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1686) Implement SASL/Kerberos
[ https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1686: - Component/s: security Implement SASL/Kerberos --- Key: KAFKA-1686 URL: https://issues.apache.org/jira/browse/KAFKA-1686 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps Fix For: 0.9.0 Implement SASL/Kerberos authentication. To do this we will need to introduce a new SASLRequest and SASLResponse pair to the client protocol. This request and response will each have only a single byte[] field and will be used to handle the SASL challenge/response cycle. Doing this will initialize the SaslServer instance and associate it with the session in a manner similar to KAFKA-1684. When using integrity or encryption mechanisms with SASL we will need to wrap and unwrap bytes as in KAFKA-1684 so the same interface that covers the SSLEngine will need to also cover the SaslServer instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1683) Implement a session concept in the socket server
[ https://issues.apache.org/jira/browse/KAFKA-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1683: - Component/s: security Implement a session concept in the socket server -- Key: KAFKA-1683 URL: https://issues.apache.org/jira/browse/KAFKA-1683 Project: Kafka Issue Type: Sub-task Components: security Affects Versions: 0.9.0 Reporter: Jay Kreps To implement authentication we need a way to keep track of some things between requests. The initial use for this would be remembering the authenticated user/principle info, but likely more uses would come up (for example we will also need to remember whether and which encryption or integrity measures are in place on the socket so we can wrap and unwrap writes and reads). I was thinking we could just add a Session object that might have a user field. The session object would need to get added to RequestChannel.Request so it is passed down to the API layer with each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1695) Authenticate connection to Zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1695: - Component/s: security Authenticate connection to Zookeeper Key: KAFKA-1695 URL: https://issues.apache.org/jira/browse/KAFKA-1695 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps We need to make it possible to secure the Zookeeper cluster Kafka is using. This would make use of the normal authentication ZooKeeper provides. ZooKeeper supports a variety of authentication mechanisms so we will need to figure out what has to be passed in to the zookeeper client. The intention is that when the current round of client work is done it should be possible to run without clients needing access to Zookeeper so all we need here is to make it so that only the Kafka cluster is able to read and write to the Kafka znodes (we shouldn't need to set any kind of acl on a per-znode basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1477) add authentication layer and initial JKS x509 implementation for brokers, producers and consumer for network communication
[ https://issues.apache.org/jira/browse/KAFKA-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1477: - Component/s: security add authentication layer and initial JKS x509 implementation for brokers, producers and consumer for network communication -- Key: KAFKA-1477 URL: https://issues.apache.org/jira/browse/KAFKA-1477 Project: Kafka Issue Type: Sub-task Components: security Reporter: Joe Stein Assignee: Ivan Lyutov Fix For: 0.8.3 Attachments: KAFKA-1477-binary.patch, KAFKA-1477.patch, KAFKA-1477_2014-06-02_16:59:40.patch, KAFKA-1477_2014-06-02_17:24:26.patch, KAFKA-1477_2014-06-03_13:46:17.patch, KAFKA-1477_trunk.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens
[ https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1696: - Component/s: security Kafka should be able to generate Hadoop delegation tokens - Key: KAFKA-1696 URL: https://issues.apache.org/jira/browse/KAFKA-1696 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps For access from MapReduce/etc jobs run on behalf of a user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1688) Add authorization interface and naive implementation
[ https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1688: - Component/s: security Add authorization interface and naive implementation Key: KAFKA-1688 URL: https://issues.apache.org/jira/browse/KAFKA-1688 Project: Kafka Issue Type: Sub-task Components: security Reporter: Jay Kreps Add a PermissionManager interface as described here: https://cwiki.apache.org/confluence/display/KAFKA/Security (possibly there is a better name?) Implement calls to the PermissionsManager in KafkaApis for the main requests (FetchRequest, ProduceRequest, etc). We will need to add a new error code and exception to the protocol to indicate permission denied. Add a server configuration to give the class you want to instantiate that implements that interface. That class can define its own configuration properties from the main config file. Provide a simple implementation of this interface which just takes a user and ip whitelist and permits those in either of the whitelists to do anything, and denies all others. Rather than writing an integration test for this class we can probably just use this class for the TLS and SASL authentication testing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
request for edit access to Kafka wiki
Hi, I'm Glen Mazza from the Apache Roller and CXF teams. Would I be able to have write access to the Kafka wiki? (I'm mazzag with Confluence, I already have an ICLA filed with Apache.) I'm starting to go through the program and its tutorials and would like to make updates if I see things missing or obsolete. Thanks, Glen
Re: request for edit access to Kafka wiki
Glen, I just added them, you should be good to go! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Oct 14, 2014 at 9:03 AM, Glen Mazza glen.ma...@gmail.com wrote: Hi, I'm Glen Mazza from the Apache Roller and CXF teams. Would I be able to have write access to the Kafka wiki? (I'm mazzag with Confluence, I already have an ICLA filed with Apache.) I'm starting to go through the program and its tutorials and would like to make updates if I see things missing or obsolete. Thanks, Glen
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171082#comment-14171082 ] James Oliver commented on KAFKA-1493: - Sorry to not be more clear - I fixed a few spots related to the removal of the LZ4HC option, but left the I/O streams in Ivan's patch alone. Since I didn't have permissions to update Ivan's reviewboard, I created a new review. 1. This looks like Ivan's interpretation of the lz4-java block stream format. 2. We should use neither - the lz4-java impl was used previously (KAFKA-1456). Review by the community produced this issue. We need a real implementation of http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26663: Patch for KAFKA-979
On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote: core/src/main/scala/kafka/log/Log.scala, line 515 https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515 Although this should work I'm wondering if it would be better to recompute a random jitter for each new rolled segment. What do you think? Only one segment should ever roll at one time in a log. So this should suffice right? - Neha --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56494 --- On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 13, 2014, 11:16 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- KAFKA-979 Add optional random jitter for time based log rolling. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
Re: Review Request 26663: Patch for KAFKA-979
On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote: core/src/main/scala/kafka/log/Log.scala, line 515 https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515 Although this should work I'm wondering if it would be better to recompute a random jitter for each new rolled segment. What do you think? Neha Narkhede wrote: Only one segment should ever roll at one time in a log. So this should suffice right? Yes, but there could be one segment each from multiple low-volume logs that roll simultaneously - which is why we want to add the jitter. The current patch should work - it's just that the jitter is set only once (up front) per log when the log is created. It does not seem there is any randomness after that. That should be okay, but if you are unlucky and get a roll schedule that isn't spread out very well that schedule will stick. - Joel --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56494 --- On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 13, 2014, 11:16 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- KAFKA-979 Add optional random jitter for time based log rolling. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Assigned] (KAFKA-566) Add last modified time to the TopicMetadataRequest
[ https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-566: Assignee: Sriharsha Chintalapani Add last modified time to the TopicMetadataRequest -- Key: KAFKA-566 URL: https://issues.apache.org/jira/browse/KAFKA-566 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.9.0 To support KAFKA-560 it would be nice to have a last modified time in the TopicMetadataRequest. This would be the timestamp of the last append to the log as taken from stat on the final log segment. Implementation would involve 1. Adding a new field to TopicMetadataRequest 2. Adding a method Log.lastModified: Long to get the last modified time from a log This timestamp would, of course, be subject to error in the event that the file was touched without modification, but I think that is actually okay since it provides a manual way to avoid gc'ing a topic that you know you will want. It is debatable whether this should go in 0.8. It would be nice to add the field to the metadata request, at least, as that change should be easy and would avoid needing to bump the version in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-560) Garbage Collect obsolete topics
[ https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani reassigned KAFKA-560: Assignee: Sriharsha Chintalapani Garbage Collect obsolete topics --- Key: KAFKA-560 URL: https://issues.apache.org/jira/browse/KAFKA-560 Project: Kafka Issue Type: New Feature Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: project Old junk topics tend to accumulate over time. Code may migrate to use new topics leaving the old ones orphaned. Likewise there are some use cases for temporary transient topics. It would be good to have a tool that could delete any topic that had not been written to in a configurable period of time and had no active consumer groups. Something like ./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect] This requires API support to get the last update time. I think it may be possible to do this through the OffsetRequest now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-560) Garbage Collect obsolete topics
[ https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171169#comment-14171169 ] Sriharsha Chintalapani commented on KAFKA-560: -- [~jkreps] [~nehanarkhede] If no one actively working on this JIRA i am interested in taking it. Assigning it to myself please change it if necessary. Garbage Collect obsolete topics --- Key: KAFKA-560 URL: https://issues.apache.org/jira/browse/KAFKA-560 Project: Kafka Issue Type: New Feature Reporter: Jay Kreps Labels: project Old junk topics tend to accumulate over time. Code may migrate to use new topics leaving the old ones orphaned. Likewise there are some use cases for temporary transient topics. It would be good to have a tool that could delete any topic that had not been written to in a configurable period of time and had no active consumer groups. Something like ./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect] This requires API support to get the last update time. I think it may be possible to do this through the OffsetRequest now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
Hey Bhavesh, This sounds like a problem. Just to confirm this is after the fix for KAFKA-1673? https://issues.apache.org/jira/browse/KAFKA-1673 It sounds like you have a reproducible test case? -Jay On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744
Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
HI Jay, Yes, it is reproducible quite easily. The problem is synchronized in RecordAccumulator. You can easy produce it. I have attached the Java code in my original email. Due to Application threads enqueue message into single partition is causing thrad contention and application thread may be blocked on this for more than a 2 minutes as shown in original email. Let me know if you need more information. Last Commit I tested with: commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99 Author: Anton Karamanov atara...@gmail.com Date: Tue Oct 7 18:22:31 2014 -0700 kafka-1644; Inherit FetchResponse from RequestOrResponse; patched by Anton Karamanov; reviewed by Jun Rao Thanks, Bhavesh On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey Bhavesh, This sounds like a problem. Just to confirm this is after the fix for KAFKA-1673? https://issues.apache.org/jira/browse/KAFKA-1673 It sounds like you have a reproducible test case? -Jay On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744
[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171371#comment-14171371 ] Vladimir Tretyakov commented on KAFKA-1481: --- Hi, Jun, Thx that you found a time to look at patch, I've added another regarding your suggestion, now in JMX you will see (hope it is what you need): {code} kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092 kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0 kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true {code} {quote} Also, your patch doesn't seem to apply to latest trunk. git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or directory {quote} Interesting... 1. I've worked with 0.8.2 branch, not trunk 2. ClientIdTopic.scala - I've added this file (must be in patch). Added 2 equal patches (created them in different way): 1. 'git diff' 2. With help from IDEA IDE. Try please, maybe one of them will work Thx again. Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Priority: Critical Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, KAFKA-1481_2014-10-13_18-23-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Tretyakov updated KAFKA-1481: -- Attachment: KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch KAFKA-1481_2014-10-14_21-53-35.patch Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Priority: Critical Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171371#comment-14171371 ] Vladimir Tretyakov edited comment on KAFKA-1481 at 10/14/14 7:09 PM: - Hi, Jun, Thx that you found a time to look at patch, I've added another regarding your suggestion, now in JMX you will see (hope it is what you need): {code} kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092 kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0 kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true {code} {quote} Also, your patch doesn't seem to apply to latest trunk. git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or directory {quote} Interesting... 1. I've worked with 0.8.2 branch, not trunk 2. ClientIdTopic.scala - I've added this file (must be in patch). Added 2 equal patches (created them in different way): 1. 'git diff' (KAFKA-1481_2014-10-14_21-53-35.patch) 2. With help from IDEA IDE. (KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch) Try please, maybe one of them will work Thx again. was (Author: vladimir.tretyakov): Hi, Jun, Thx that you found a time to look at patch, I've added another regarding your suggestion, now in JMX you will see (hope it is what you need): {code} kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092 kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0 kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true {code} {quote} Also, your patch doesn't seem to apply to latest trunk. git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or directory {quote} Interesting... 1. I've worked with 0.8.2 branch, not trunk 2. ClientIdTopic.scala - I've added this file (must be in patch). Added 2 equal patches (created them in different way): 1. 'git diff' 2. With help from IDEA IDE. Try please, maybe one of them will work Thx again. Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Priority: Critical Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26663: Patch for KAFKA-979
On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote: core/src/main/scala/kafka/log/Log.scala, line 515 https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515 Although this should work I'm wondering if it would be better to recompute a random jitter for each new rolled segment. What do you think? Neha Narkhede wrote: Only one segment should ever roll at one time in a log. So this should suffice right? Joel Koshy wrote: Yes, but there could be one segment each from multiple low-volume logs that roll simultaneously - which is why we want to add the jitter. The current patch should work - it's just that the jitter is set only once (up front) per log when the log is created. It does not seem there is any randomness after that. That should be okay, but if you are unlucky and get a roll schedule that isn't spread out very well that schedule will stick. Even for low volume logs, as long as every log as a random jitter, it seems unlikely that time based rolling of log segments would align across logs. For that sort of alignment, two logs would have to have the same jitter set. - Neha --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56494 --- On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 13, 2014, 11:16 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- KAFKA-979 Add optional random jitter for time based log rolling. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Commented] (KAFKA-566) Add last modified time to the TopicMetadataRequest
[ https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171531#comment-14171531 ] Neha Narkhede commented on KAFKA-566: - [~sriharsha] Thanks for taking this on. Add last modified time to the TopicMetadataRequest -- Key: KAFKA-566 URL: https://issues.apache.org/jira/browse/KAFKA-566 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.9.0 To support KAFKA-560 it would be nice to have a last modified time in the TopicMetadataRequest. This would be the timestamp of the last append to the log as taken from stat on the final log segment. Implementation would involve 1. Adding a new field to TopicMetadataRequest 2. Adding a method Log.lastModified: Long to get the last modified time from a log This timestamp would, of course, be subject to error in the event that the file was touched without modification, but I think that is actually okay since it provides a manual way to avoid gc'ing a topic that you know you will want. It is debatable whether this should go in 0.8. It would be nice to add the field to the metadata request, at least, as that change should be easy and would avoid needing to bump the version in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-566) Add last modified time to the TopicMetadataRequest
[ https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-566: Reviewer: Neha Narkhede Add last modified time to the TopicMetadataRequest -- Key: KAFKA-566 URL: https://issues.apache.org/jira/browse/KAFKA-566 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Fix For: 0.9.0 To support KAFKA-560 it would be nice to have a last modified time in the TopicMetadataRequest. This would be the timestamp of the last append to the log as taken from stat on the final log segment. Implementation would involve 1. Adding a new field to TopicMetadataRequest 2. Adding a method Log.lastModified: Long to get the last modified time from a log This timestamp would, of course, be subject to error in the event that the file was touched without modification, but I think that is actually okay since it provides a manual way to avoid gc'ing a topic that you know you will want. It is debatable whether this should go in 0.8. It would be nice to add the field to the metadata request, at least, as that change should be easy and would avoid needing to bump the version in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-560) Garbage Collect obsolete topics
[ https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-560: Reviewer: Neha Narkhede Garbage Collect obsolete topics --- Key: KAFKA-560 URL: https://issues.apache.org/jira/browse/KAFKA-560 Project: Kafka Issue Type: New Feature Reporter: Jay Kreps Assignee: Sriharsha Chintalapani Labels: project Old junk topics tend to accumulate over time. Code may migrate to use new topics leaving the old ones orphaned. Likewise there are some use cases for temporary transient topics. It would be good to have a tool that could delete any topic that had not been written to in a configurable period of time and had no active consumer groups. Something like ./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect] This requires API support to get the last update time. I think it may be possible to do this through the OffsetRequest now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26663: Patch for KAFKA-979
On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote: core/src/main/scala/kafka/log/Log.scala, line 515 https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515 Although this should work I'm wondering if it would be better to recompute a random jitter for each new rolled segment. What do you think? Neha Narkhede wrote: Only one segment should ever roll at one time in a log. So this should suffice right? Joel Koshy wrote: Yes, but there could be one segment each from multiple low-volume logs that roll simultaneously - which is why we want to add the jitter. The current patch should work - it's just that the jitter is set only once (up front) per log when the log is created. It does not seem there is any randomness after that. That should be okay, but if you are unlucky and get a roll schedule that isn't spread out very well that schedule will stick. Neha Narkhede wrote: Even for low volume logs, as long as every log as a random jitter, it seems unlikely that time based rolling of log segments would align across logs. For that sort of alignment, two logs would have to have the same jitter set. Actually, ignore what I said. the config.randomSegmentJitter() actually recomputes a random jitter time for every log segment. So no 2 segments should reasonably roll at the same time. - Neha --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56494 --- On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 13, 2014, 11:16 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- KAFKA-979 Add optional random jitter for time based log rolling. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
Review Request 26710: Patch for KAFKA-1637
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26710/ --- Review request for kafka. Bugs: KAFKA-1637 https://issues.apache.org/jira/browse/KAFKA-1637 Repository: kafka Description --- KAFKA-1637 Return correct error code and offsets for OffsetFetchRequest for unknown topics/partitions vs no associated consumer. Diffs - core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 1586243d20d6a181a1bd9f07e1c9493596005b32 core/src/main/scala/kafka/server/KafkaApis.scala 67f2833804cb15976680e42b9dc49e275c89d266 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 2d9325045ac1ac2d7531161b32c98c847125cbf0 Diff: https://reviews.apache.org/r/26710/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1637: - Assignee: Ewen Cheslack-Postava Status: Patch Available (was: Open) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1637: - Attachment: KAFKA-1637.patch SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171610#comment-14171610 ] Ewen Cheslack-Postava commented on KAFKA-1637: -- Created reviewboard https://reviews.apache.org/r/26710/diff/ against branch origin/trunk SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171621#comment-14171621 ] Ewen Cheslack-Postava commented on KAFKA-1637: -- The error code is for UnkownTopicOrPartition, which may have been correct if the request was for a non-existent topic or partition. Previously the code seemed to be doing the correct thing, reporting this error and returning invalid offset when the consumer hadn't started reading from that group. But KAFKA-1012 (a670537aa337) actually changed that behavior. The provided patch tries to cover the different possible scenarios (missing topic, invalid partition, and valid TopicAndPartition but a consumer with no offset for it). One potential caveat is auto topic creation since it could be reasonable to not return UnkownTopicOrPartition for a missing topic in that case. I'm not sure we really want different behavior in that case though. SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26663: Patch for KAFKA-979
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 14, 2014, 10:33 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description (updated) --- Add a new options log.roll.jitter.ms and log.roll.jitter.hours to add random jitter to time-based log rolling so logs aren't likely to roll at exactly the same time. Jitter always reduces the timeout so log.roll.ms still provides a soft maximum. Defaults to 0 so no jitter is added by default. Addressing warning and Util.abs comments. Diffs (updated) - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Commented] (KAFKA-979) Add jitter for time based rolling
[ https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171648#comment-14171648 ] Ewen Cheslack-Postava commented on KAFKA-979: - Updated reviewboard https://reviews.apache.org/r/26663/diff/ against branch origin/trunk Add jitter for time based rolling - Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-979) Add jitter for time based rolling
[ https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-979: Attachment: KAFKA-979_2014-10-14_15:33:31.patch Add jitter for time based rolling - Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
Bhavesh, It seems that all those threads are blocked on waiting for the lock on the dq for that partition. There got to be another thread holding the dq lock at that point. Could you create a jira and attach the full thread dump there? Also, could you attach the yourkit result that shows the breakdown of the time? Thanks, Jun On Tue, Oct 14, 2014 at 10:41 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: HI Jay, Yes, it is reproducible quite easily. The problem is synchronized in RecordAccumulator. You can easy produce it. I have attached the Java code in my original email. Due to Application threads enqueue message into single partition is causing thrad contention and application thread may be blocked on this for more than a 2 minutes as shown in original email. Let me know if you need more information. Last Commit I tested with: commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99 Author: Anton Karamanov atara...@gmail.com Date: Tue Oct 7 18:22:31 2014 -0700 kafka-1644; Inherit FetchResponse from RequestOrResponse; patched by Anton Karamanov; reviewed by Jun Rao Thanks, Bhavesh On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps jay.kr...@gmail.com wrote: Hey Bhavesh, This sounds like a problem. Just to confirm this is after the fix for KAFKA-1673? https://issues.apache.org/jira/browse/KAFKA-1673 It sounds like you have a reproducible test case? -Jay On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com wrote: Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744
[jira] [Created] (KAFKA-1705) Add MR layer to Kafka
Gwen Shapira created KAFKA-1705: --- Summary: Add MR layer to Kafka Key: KAFKA-1705 URL: https://issues.apache.org/jira/browse/KAFKA-1705 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Gwen Shapira Many NoSQL-type storage systems (HBase, Mongo, Cassandra) and file formats (Avro, Parquet) provide is a MapReduce integration layer - usually an InputFormat, OutputFormat and a utility class. Sometimes there's also an abstract Job and Mapper that do more setup, which can make things even more convenient. This is different than the existing Hadoop contrib project or Camus in that an MR layer will be providing components for use in MR jobs, not an entire job that ingests data from Kafka to HDFS. The benefits I see for a MapReduce layer are: * Developers can create their own jobs, processing the data as it is ingested - rather than having to process it in two steps. * There's reusable components for developers looking to integrate with Kafka, rather than having everyone implement their own solution. * Hadoop developers expect projects to have this layer. * Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark integration for free. * There's a layer to plug the delegation token code into and make it invisible to MapReduce developers. Without this, everyone who writes MR jobs will need to think about how to implement authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1705) Add MR layer to Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171720#comment-14171720 ] Gwen Shapira commented on KAFKA-1705: - Looking for community comments on this. Do others see this as a good thing to add? Add MR layer to Kafka - Key: KAFKA-1705 URL: https://issues.apache.org/jira/browse/KAFKA-1705 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Gwen Shapira Many NoSQL-type storage systems (HBase, Mongo, Cassandra) and file formats (Avro, Parquet) provide is a MapReduce integration layer - usually an InputFormat, OutputFormat and a utility class. Sometimes there's also an abstract Job and Mapper that do more setup, which can make things even more convenient. This is different than the existing Hadoop contrib project or Camus in that an MR layer will be providing components for use in MR jobs, not an entire job that ingests data from Kafka to HDFS. The benefits I see for a MapReduce layer are: * Developers can create their own jobs, processing the data as it is ingested - rather than having to process it in two steps. * There's reusable components for developers looking to integrate with Kafka, rather than having everyone implement their own solution. * Hadoop developers expect projects to have this layer. * Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark integration for free. * There's a layer to plug the delegation token code into and make it invisible to MapReduce developers. Without this, everyone who writes MR jobs will need to think about how to implement authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171732#comment-14171732 ] Jun Rao commented on KAFKA-1493: James, Thanks, got it now. Not sure how long it will take to get a real implementation of http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html. Should we just take out LZ4 in CompressionType and CompressionCodec in 0.8.2 so that people don't use it until it's fixed? Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171748#comment-14171748 ] James Oliver commented on KAFKA-1493: - I implemented the OutputStream today. If I can't get the InputStream done and tested before I leave for vacation Thursday, IMO we should take it out. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Bad link on the document
Hi, Usage information on the hadoop consumer can be found here https://github.com/linkedin/camus/tree/camus-kafka-0.8/. the Link is broken, who can kindly fix it, thanks
Re: Review Request 26663: Patch for KAFKA-979
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/#review56652 --- Looks good overall - I have one comment on interpreting the jitter value though. core/src/main/scala/kafka/log/Log.scala https://reviews.apache.org/r/26663/#comment97045 Thinking about this a bit more: do you think it would be safer to interpret jitter as an additive value to segmentMs? i.e., the actual age for rolling will be config.segmentMs + segment.rollJitterMs; (and limit segment.rollJitterMs to an interval of [0, config.segmentMs] which you are already doing.) Otherwise if a user happens to set a high jitter time then nearly empty segments roll often (with high probability). Another way to interpret it is as a jitter window. i.e., the actual age for rolling will be config.segmentMs + segment.rollJitterMs; and limit segment.rollJitterMs to an interval of [-config.segmentMs / 2, config.segmentMs / 2] Thoughts? - Joel Koshy On Oct. 14, 2014, 10:33 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26663/ --- (Updated Oct. 14, 2014, 10:33 p.m.) Review request for kafka. Bugs: KAFKA-979 https://issues.apache.org/jira/browse/KAFKA-979 Repository: kafka Description --- Add a new options log.roll.jitter.ms and log.roll.jitter.hours to add random jitter to time-based log rolling so logs aren't likely to roll at exactly the same time. Jitter always reduces the timeout so log.roll.ms still provides a soft maximum. Defaults to 0 so no jitter is added by default. Addressing warning and Util.abs comments. Diffs - core/src/main/scala/kafka/log/Log.scala a123cdc52f341a802b3e4bfeb29a6154332e5f73 core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff core/src/main/scala/kafka/log/LogConfig.scala d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 core/src/main/scala/kafka/log/LogSegment.scala 7597d309f37a0b3756381f9500100ef763d466ba core/src/main/scala/kafka/server/KafkaConfig.scala 7fcbc16da898623b03659c803e2a20c7d1bd1011 core/src/main/scala/kafka/server/KafkaServer.scala 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 7b97e6a80753a770ac094e101c653193dec67e68 core/src/test/scala/unit/kafka/log/LogTest.scala a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b Diff: https://reviews.apache.org/r/26663/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Updated] (KAFKA-979) Add jitter for time based rolling
[ https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-979: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk and 0.8.2 Add jitter for time based rolling - Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-979) Add jitter for time based rolling
[ https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171962#comment-14171962 ] Joel Koshy commented on KAFKA-979: -- Just had one more comment in the RB on how to use the jitter config. Can you take a quick look at that? Add jitter for time based rolling - Key: KAFKA-979 URL: https://issues.apache.org/jira/browse/KAFKA-979 Project: Kafka Issue Type: Bug Reporter: Sriram Subramanian Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch Currently, for low volume topics time based rolling happens at the same time. This causes a lot of IO on a typical cluster and creates back pressure. We need to add a jitter to prevent them from happening at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26710: Patch for KAFKA-1637
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26710/#review56655 --- Ship it! Ship It! - Neha Narkhede On Oct. 14, 2014, 10:04 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26710/ --- (Updated Oct. 14, 2014, 10:04 p.m.) Review request for kafka. Bugs: KAFKA-1637 https://issues.apache.org/jira/browse/KAFKA-1637 Repository: kafka Description --- KAFKA-1637 Return correct error code and offsets for OffsetFetchRequest for unknown topics/partitions vs no associated consumer. Diffs - core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 1586243d20d6a181a1bd9f07e1c9493596005b32 core/src/main/scala/kafka/server/KafkaApis.scala 67f2833804cb15976680e42b9dc49e275c89d266 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 2d9325045ac1ac2d7531161b32c98c847125cbf0 Diff: https://reviews.apache.org/r/26710/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Updated] (KAFKA-1698) Validator.ensureValid() only validates default config value
[ https://issues.apache.org/jira/browse/KAFKA-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1698: - Reviewer: Jun Rao Validator.ensureValid() only validates default config value --- Key: KAFKA-1698 URL: https://issues.apache.org/jira/browse/KAFKA-1698 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Ewen Cheslack-Postava Labels: newbie++ Fix For: 0.8.3 Attachments: KAFKA-1698.patch We should use it to validate the actual configured value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1637: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk and 0.8.2 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171998#comment-14171998 ] Guozhang Wang commented on KAFKA-1583: -- Bump, [~junrao] could you take a look now? Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26710: Patch for KAFKA-1637
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26710/#review56662 --- core/src/main/scala/kafka/server/KafkaApis.scala https://reviews.apache.org/r/26710/#comment97067 There is an issue here. The replica manager only contains information about partitions that are assigned to this broker. However, some consumer group's offset manager may also be on this broker and that group may consume various partitions that are not assigned to this broker. The offset manager though will still contain offsets for those partitions. - Joel Koshy On Oct. 14, 2014, 10:04 p.m., Ewen Cheslack-Postava wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26710/ --- (Updated Oct. 14, 2014, 10:04 p.m.) Review request for kafka. Bugs: KAFKA-1637 https://issues.apache.org/jira/browse/KAFKA-1637 Repository: kafka Description --- KAFKA-1637 Return correct error code and offsets for OffsetFetchRequest for unknown topics/partitions vs no associated consumer. Diffs - core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 1586243d20d6a181a1bd9f07e1c9493596005b32 core/src/main/scala/kafka/server/KafkaApis.scala 67f2833804cb15976680e42b9dc49e275c89d266 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 2d9325045ac1ac2d7531161b32c98c847125cbf0 Diff: https://reviews.apache.org/r/26710/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Commented] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172020#comment-14172020 ] Joel Koshy commented on KAFKA-1637: --- -1 on this patch per the comment in the RB. Reverted it for now. SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Assignee: Ewen Cheslack-Postava Labels: newbie Attachments: KAFKA-1637.patch This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)