date:20141014


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56494
---



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/26663/#comment96865

Although this should work I'm wondering if it would be better to recompute 
a random jitter for each new rolled segment. What do you think?



core/src/main/scala/kafka/log/LogConfig.scala
https://reviews.apache.org/r/26663/#comment96864

We should use Utils.abs before returning or it could return a large 
negative result.


- Joel Koshy


On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 13, 2014, 11:16 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-979 Add optional random jitter for time based log rolling.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

Re: [DISCUSSION] Message Metadata

2014-10-14 Thread Joe Stein

I think we could add schemaId(binary) to the MessageAndMetaData

With the schemaId you can implement different downstream software pattern
on the messages reliably. I wrote up more thoughts on this use
https://cwiki.apache.org/confluence/display/KAFKA/Schema+based+topics it
should strive to encompass all implementation needs for producer, broker,
consumer hooks.

So if the application and tagged fields are important you can package that
into a specific Kafka topic plug-in and assign it to topic(s).  Kafka
server should be able to validate your expected formats (like
encoders/decoders but in broker by topic regardless of producer) to the
topics that have it enabled. We should have these maintained in the project
under contrib.

=- Joestein

On Mon, Oct 13, 2014 at 11:02 PM, Guozhang Wang wangg...@gmail.com wrote:

 Hi Jay,

 Thanks for the comments. Replied inline.

 Guozhang

 On Mon, Oct 13, 2014 at 11:11 AM, Jay Kreps jay.kr...@gmail.com wrote:

  I need to take more time to think about this. Here are a few off-the-cuff
  remarks:
 
  - To date we have tried really, really hard to keep the data model for
  message simple since after all you can always add whatever you like
 inside
  the message body.
 
  - For system tags, why not just make these fields first class fields in
  message? The purpose of a system tag is presumably that Why have a bunch
 of
  key-value pairs versus first-class fields?
 

 Yes, we can alternatively make system tags as first class fields in the
 message header to make the format / processing logic simpler.

 The main reasons I put them as systems tags are 1) when I think about these
 possible system tags, some of them are for all types of messages (e.g.
 timestamps), but some of them may be for a specific type of message
 (compressed, control message) and for those not all of them are necessarily
 required all the time, hence making them as compact tags may save us some
 space when not all of them are available; 2) with tags we do not need to
 bump up the protocol version every time we make a change to it, which
 includes keeping the logic to handle all versions on the broker until the
 old ones are officially discarded; instead, the broker can just ignore a
 tag if its id is not recognizable since the client is on a newer version,
 or use some default value / throw exception if a required tag is missing
 since the client is on an older version.


 
  - You don't necessarily need application-level tags explicitly
 represented
  in the message format for efficiency. The application can define their
 own
  header (e.g. their message could be a size delimited header followed by a
  size delimited body). But actually if you use Avro you don't even need
 this
  I don't think. Avro has the ability to just deserialize the header
 fields
  in your message. Avro has a notion of reader and writer schemas. The
 writer
  schema is whatever the message was written with. If the reader schema is
  just the header, avro will skip any fields it doesn't need and just
  deserialize the fields it does need. This is actually a much more usable
  and flexible way to define a header since you get all the types avro
 allows
  instead of just bytes.
 

 I agree that we can use a reader schema to just read out the header without
 de-serializing the full message, and probably for compressed message we can
 add an Avro / etc header for the compressed wrapper message also, but that
 would enforce these applications (MM, auditor, clients) to be schema-aware,
 which would usually require the people who manage this data pipeline also
 manage the schemas, whereas ideally Kafka itself should just consider
 bytes-in and bytes-out (and maybe a little bit more, like timestamps). The
 purpose here is to not introduce an extra dependency while at the same time
 allow applications to not fully de-serialize / de-compress the message in
 order to do some simple processing based on metadata only.


 
  - We will need to think carefully about what to do with timestamps if we
  end up including them. There are actually several timestamps
- The time the producer created the message
- The time the leader received the message
- The time the current broker received the message
  The producer timestamps won't be at all increasing. The leader timestamp
  will be mostly increasing except when the clock changes or leadership
  moves. This somewhat complicates the use of these timestamps, though.
 From
  the point of view of the producer the only time that matters is the time
  the message was created. However since the producer sets this it can be
  arbitrarily bad (remember all the ntp issues and 1970 timestamps we would
  get). Say that the heuristic was to use the timestamp of the first
 message
  in a file for retention, the problem would be that the timestamps for the
  segments need not even be sequential and a single bad producer could send
  data with time in the distant past or future causing data to be

[jira] [Updated] (KAFKA-1691) new java consumer needs ssl support as a client


 [ 
https://issues.apache.org/jira/browse/KAFKA-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1691:
-
Component/s: security

 new java consumer needs ssl support as a client
 ---

 Key: KAFKA-1691
 URL: https://issues.apache.org/jira/browse/KAFKA-1691
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Joe Stein
Assignee: Ivan Lyutov
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1682) Security for Kafka


 [ 
https://issues.apache.org/jira/browse/KAFKA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1682:
-
Component/s: security

 Security for Kafka
 --

 Key: KAFKA-1682
 URL: https://issues.apache.org/jira/browse/KAFKA-1682
 Project: Kafka
  Issue Type: New Feature
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps

 Parent ticket for security. Wiki and discussion is here:
 https://cwiki.apache.org/confluence/display/KAFKA/Security



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1685) Implement TLS/SSL tests


 [ 
https://issues.apache.org/jira/browse/KAFKA-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1685:
-
Component/s: security

 Implement TLS/SSL tests
 ---

 Key: KAFKA-1685
 URL: https://issues.apache.org/jira/browse/KAFKA-1685
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
 Fix For: 0.9.0


 We need to write a suite of unit tests for TLS authentication. This should be 
 doable with a junit integration test. We can use the simple authorization 
 plugin with only a single user whitelisted. The test can start the server and 
 then connects with and without TLS and validates that access is only possible 
 when authenticated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication

[
https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Stein updated KAFKA-1684:
-
Component/s: security

Implement TLS/SSL authentication

Key: KAFKA-1684
URL: https://issues.apache.org/jira/browse/KAFKA-1684
Project: Kafka
Issue Type: Sub-task
Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps

Add an SSL port to the configuration and advertise this as part of the
metadata request.
If the SSL port is configured the socket server will need to add a second
Acceptor thread to listen on it. Connections accepted on this port will need
to go through the SSL handshake prior to being registered with a Processor
for request processing.
SSL requests and responses may need to be wrapped or unwrapped using the
SSLEngine that was initialized by the acceptor. This wrapping and unwrapping
is very similar to what will need to be done for SASL-based authentication
schemes. We should have a uniform interface that covers both of these and we
will need to store the instance in the session with the request. The socket
server will have to use this object when reading and writing requests. We
will need to take care with the FetchRequests as the current
FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we
can only use this optimization for unencrypted sockets that don't require
userspace translation (wrapping).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1686) Implement SASL/Kerberos


 [ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1686:
-
Component/s: security

 Implement SASL/Kerberos
 ---

 Key: KAFKA-1686
 URL: https://issues.apache.org/jira/browse/KAFKA-1686
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
 Fix For: 0.9.0


 Implement SASL/Kerberos authentication.
 To do this we will need to introduce a new SASLRequest and SASLResponse pair 
 to the client protocol. This request and response will each have only a 
 single byte[] field and will be used to handle the SASL challenge/response 
 cycle. Doing this will initialize the SaslServer instance and associate it 
 with the session in a manner similar to KAFKA-1684.
 When using integrity or encryption mechanisms with SASL we will need to wrap 
 and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
 SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1683) Implement a session concept in the socket server


 [ 
https://issues.apache.org/jira/browse/KAFKA-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1683:
-
Component/s: security

 Implement a session concept in the socket server
 --

 Key: KAFKA-1683
 URL: https://issues.apache.org/jira/browse/KAFKA-1683
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps

 To implement authentication we need a way to keep track of some things 
 between requests. The initial use for this would be remembering the 
 authenticated user/principle info, but likely more uses would come up (for 
 example we will also need to remember whether and which encryption or 
 integrity measures are in place on the socket so we can wrap and unwrap 
 writes and reads).
 I was thinking we could just add a Session object that might have a user 
 field. The session object would need to get added to RequestChannel.Request 
 so it is passed down to the API layer with each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1695) Authenticate connection to Zookeeper


 [ 
https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1695:
-
Component/s: security

 Authenticate connection to Zookeeper
 

 Key: KAFKA-1695
 URL: https://issues.apache.org/jira/browse/KAFKA-1695
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps

 We need to make it possible to secure the Zookeeper cluster Kafka is using. 
 This would make use of the normal authentication ZooKeeper provides. 
 ZooKeeper supports a variety of authentication mechanisms so we will need to 
 figure out what has to be passed in to the zookeeper client.
 The intention is that when the current round of client work is done it should 
 be possible to run without clients needing access to Zookeeper so all we need 
 here is to make it so that only the Kafka cluster is able to read and write 
 to the Kafka znodes  (we shouldn't need to set any kind of acl on a per-znode 
 basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1477) add authentication layer and initial JKS x509 implementation for brokers, producers and consumer for network communication


 [ 
https://issues.apache.org/jira/browse/KAFKA-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1477:
-
Component/s: security

 add authentication layer and initial JKS x509 implementation for brokers, 
 producers and consumer for network communication
 --

 Key: KAFKA-1477
 URL: https://issues.apache.org/jira/browse/KAFKA-1477
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Joe Stein
Assignee: Ivan Lyutov
 Fix For: 0.8.3

 Attachments: KAFKA-1477-binary.patch, KAFKA-1477.patch, 
 KAFKA-1477_2014-06-02_16:59:40.patch, KAFKA-1477_2014-06-02_17:24:26.patch, 
 KAFKA-1477_2014-06-03_13:46:17.patch, KAFKA-1477_trunk.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens


 [ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1696:
-
Component/s: security

 Kafka should be able to generate Hadoop delegation tokens
 -

 Key: KAFKA-1696
 URL: https://issues.apache.org/jira/browse/KAFKA-1696
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps

 For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1688) Add authorization interface and naive implementation


 [ 
https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1688:
-
Component/s: security

 Add authorization interface and naive implementation
 

 Key: KAFKA-1688
 URL: https://issues.apache.org/jira/browse/KAFKA-1688
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps

 Add a PermissionManager interface as described here:
 https://cwiki.apache.org/confluence/display/KAFKA/Security
 (possibly there is a better name?)
 Implement calls to the PermissionsManager in KafkaApis for the main requests 
 (FetchRequest, ProduceRequest, etc). We will need to add a new error code and 
 exception to the protocol to indicate permission denied.
 Add a server configuration to give the class you want to instantiate that 
 implements that interface. That class can define its own configuration 
 properties from the main config file.
 Provide a simple implementation of this interface which just takes a user and 
 ip whitelist and permits those in either of the whitelists to do anything, 
 and denies all others.
 Rather than writing an integration test for this class we can probably just 
 use this class for the TLS and SASL authentication testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

request for edit access to Kafka wiki

2014-10-14 Thread Glen Mazza

Hi, I'm Glen Mazza from the Apache Roller and CXF teams.  Would I be 
able to have write access to the Kafka wiki?  (I'm mazzag with 
Confluence, I already have an ICLA filed with Apache.)  I'm starting to 
go through the program and its tutorials and would like to make updates 
if I see things missing or obsolete.


Thanks,
Glen

Re: request for edit access to Kafka wiki

2014-10-14 Thread Joe Stein

Glen,

I just added them, you should be good to go!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Tue, Oct 14, 2014 at 9:03 AM, Glen Mazza glen.ma...@gmail.com wrote:

 Hi, I'm Glen Mazza from the Apache Roller and CXF teams.  Would I be able
 to have write access to the Kafka wiki?  (I'm mazzag with Confluence, I
 already have an ICLA filed with Apache.)  I'm starting to go through the
 program and its tutorials and would like to make updates if I see things
 missing or obsolete.

 Thanks,
 Glen

[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-10-14 Thread James Oliver (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171082#comment-14171082
 ] 

James Oliver commented on KAFKA-1493:
-

Sorry to not be more clear - I fixed a few spots related to the removal of the 
LZ4HC option, but left the I/O streams in Ivan's patch alone. Since I didn't 
have permissions to update Ivan's reviewboard, I created a new review.

1. This looks like Ivan's interpretation of the lz4-java block stream format.
2. We should use neither - the lz4-java impl was used previously (KAFKA-1456). 
Review by the community produced this issue. We need a real implementation of 
http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1493.patch, KAFKA-1493.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26663: Patch for KAFKA-979



 On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/log/Log.scala, line 515
  https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515
 
  Although this should work I'm wondering if it would be better to 
  recompute a random jitter for each new rolled segment. What do you think?

Only one segment should ever roll at one time in a log. So this should suffice 
right?


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56494
---


On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 13, 2014, 11:16 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-979 Add optional random jitter for time based log rolling.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

Re: Review Request 26663: Patch for KAFKA-979



 On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/log/Log.scala, line 515
  https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515
 
  Although this should work I'm wondering if it would be better to 
  recompute a random jitter for each new rolled segment. What do you think?
 
 Neha Narkhede wrote:
 Only one segment should ever roll at one time in a log. So this should 
 suffice right?

Yes, but there could be one segment each from multiple low-volume logs that 
roll simultaneously - which is why we want to add the jitter. The current patch 
should work - it's just that the jitter is set only once (up front) per log 
when the log is created. It does not seem there is any randomness after that. 
That should be okay, but if you are unlucky and get a roll schedule that isn't 
spread out very well that schedule will stick.


- Joel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56494
---


On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 13, 2014, 11:16 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-979 Add optional random jitter for time based log rolling.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

[jira] [Assigned] (KAFKA-566) Add last modified time to the TopicMetadataRequest

2014-10-14 Thread Sriharsha Chintalapani (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-566:


Assignee: Sriharsha Chintalapani

 Add last modified time to the TopicMetadataRequest
 --

 Key: KAFKA-566
 URL: https://issues.apache.org/jira/browse/KAFKA-566
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
 Fix For: 0.9.0


 To support KAFKA-560 it would be nice to have a last modified time in the 
 TopicMetadataRequest. This would be the timestamp of the last append to the 
 log as taken from stat on the final log segment.
 Implementation would involve
 1. Adding a new field to TopicMetadataRequest
 2. Adding a method Log.lastModified: Long to get the last modified time from 
 a log
 This timestamp would, of course, be subject to error in the event that the 
 file was touched without modification, but I think that is actually okay 
 since it provides a manual way to avoid gc'ing a topic that you  know you 
 will want.
 It is debatable whether this should go in 0.8. It would be nice to add the 
 field to the metadata request, at least, as that change should be easy and 
 would avoid needing to bump the version in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (KAFKA-560) Garbage Collect obsolete topics

2014-10-14 Thread Sriharsha Chintalapani (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-560:


Assignee: Sriharsha Chintalapani

 Garbage Collect obsolete topics
 ---

 Key: KAFKA-560
 URL: https://issues.apache.org/jira/browse/KAFKA-560
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
  Labels: project

 Old junk topics tend to accumulate over time. Code may migrate to use new 
 topics leaving the old ones orphaned. Likewise there are some use cases for 
 temporary transient topics. It would be good to have a tool that could delete 
 any topic that had not been written to in a configurable period of time and 
 had no active consumer groups. Something like
./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect]
 This requires API support to get the last update time. I think it may be 
 possible to do this through the OffsetRequest now?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-560) Garbage Collect obsolete topics

2014-10-14 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171169#comment-14171169
 ] 

Sriharsha Chintalapani commented on KAFKA-560:
--

[~jkreps] [~nehanarkhede] If no one actively working on this JIRA i am 
interested in taking it. Assigning it to myself please change it if necessary.

 Garbage Collect obsolete topics
 ---

 Key: KAFKA-560
 URL: https://issues.apache.org/jira/browse/KAFKA-560
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
  Labels: project

 Old junk topics tend to accumulate over time. Code may migrate to use new 
 topics leaving the old ones orphaned. Likewise there are some use cases for 
 temporary transient topics. It would be good to have a tool that could delete 
 any topic that had not been written to in a configurable period of time and 
 had no active consumer groups. Something like
./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect]
 This requires API support to get the last update time. I think it may be 
 possible to do this through the OffsetRequest now?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-14 Thread Jay Kreps

Hey Bhavesh,

This sounds like a problem. Just to confirm this is after the fix for
KAFKA-1673?

https://issues.apache.org/jira/browse/KAFKA-1673

It sounds like you have a reproducible test case?

-Jay


On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com
 wrote:

 Hi Kafka Dev Team,

 When I run the test to send message to single partition for 3 minutes or
 so on, I have encounter deadlock (please see the screen attached) and
 thread contention from YourKit profiling.

 Use Case:

 1)  Aggregating messages into same partition for metric counting.
 2)  Replicate Old Producer behavior for sticking to partition for 3
 minutes.


 Here is output:

 Frozen threads found (potential deadlock)

 It seems that the following threads have not changed their stack for more
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.

 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run()
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run()
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744



 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run()
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run()
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744



 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
 byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run()
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run()
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744

Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-14 Thread Bhavesh Mistry

HI Jay,

Yes, it is reproducible quite easily.  The problem is synchronized in
RecordAccumulator.  You can easy produce it.  I have attached the Java code
in my original email.  Due to Application threads enqueue message into
single partition is causing thrad contention and application thread may be
blocked on this for more than a 2 minutes as shown in original email.   Let
me know if you need more information.

Last Commit I tested with:

commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99
Author: Anton Karamanov atara...@gmail.com
Date:   Tue Oct 7 18:22:31 2014 -0700

kafka-1644; Inherit FetchResponse from RequestOrResponse; patched by
Anton Karamanov; reviewed by Jun Rao

Thanks,

Bhavesh

On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey Bhavesh,

 This sounds like a problem. Just to confirm this is after the fix for
 KAFKA-1673?

 https://issues.apache.org/jira/browse/KAFKA-1673

 It sounds like you have a reproducible test case?

 -Jay


 On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry 
 mistry.p.bhav...@gmail.com
  wrote:

  Hi Kafka Dev Team,
 
  When I run the test to send message to single partition for 3 minutes or
  so on, I have encounter deadlock (please see the screen attached) and
  thread contention from YourKit profiling.
 
  Use Case:
 
  1)  Aggregating messages into same partition for metric counting.
  2)  Replicate Old Producer behavior for sticking to partition for 3
  minutes.
 
 
  Here is output:
 
  Frozen threads found (potential deadlock)
 
  It seems that the following threads have not changed their stack for more
  than 10 seconds.
  These threads are possibly (but not necessarily!) in a deadlock or hung.
 
  pool-1-thread-128 --- Frozen for at least 2m
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
  org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
  Callback) KafkaProducer.java:237
  org.kafka.test.TestNetworkDownProducer$MyProducer.run()
  TestNetworkDownProducer.java:84
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
  ThreadPoolExecutor.java:1145
  java.util.concurrent.ThreadPoolExecutor$Worker.run()
  ThreadPoolExecutor.java:615
  java.lang.Thread.run() Thread.java:744
 
 
 
  pool-1-thread-159 --- Frozen for at least 2m 1 sec
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
  org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
  Callback) KafkaProducer.java:237
  org.kafka.test.TestNetworkDownProducer$MyProducer.run()
  TestNetworkDownProducer.java:84
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
  ThreadPoolExecutor.java:1145
  java.util.concurrent.ThreadPoolExecutor$Worker.run()
  ThreadPoolExecutor.java:615
  java.lang.Thread.run() Thread.java:744
 
 
 
  pool-1-thread-55 --- Frozen for at least 2m
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
  org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
  Callback) KafkaProducer.java:237
  org.kafka.test.TestNetworkDownProducer$MyProducer.run()
  TestNetworkDownProducer.java:84
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
  ThreadPoolExecutor.java:1145
  java.util.concurrent.ThreadPoolExecutor$Worker.run()
  ThreadPoolExecutor.java:615
  java.lang.Thread.run() Thread.java:744

[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-14 Thread Vladimir Tretyakov (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171371#comment-14171371
 ] 

Vladimir Tretyakov commented on KAFKA-1481:
---

Hi, Jun, 

Thx that you found a time to look at patch, I've added another regarding your 
suggestion, now in JMX you will see (hope it is what you need):
{code}
kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0
kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic
kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true
{code}

{quote}
Also, your patch doesn't seem to apply to latest trunk.
git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch 
error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or 
directory
{quote}
Interesting...
1. I've worked with 0.8.2 branch, not trunk
2. ClientIdTopic.scala - I've added this file (must be in patch). 

Added 2 equal patches (created them in different way):
1. 'git diff' 
2. With help from IDEA IDE.

Try please, maybe one of them will work

Thx again.


 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
Priority: Critical
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
 KAFKA-1481_2014-10-13_18-23-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-14 Thread Vladimir Tretyakov (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Tretyakov updated KAFKA-1481:
--
Attachment: KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch
KAFKA-1481_2014-10-14_21-53-35.patch

 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
Priority: Critical
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
 KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-14 Thread Vladimir Tretyakov (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171371#comment-14171371
 ] 

Vladimir Tretyakov edited comment on KAFKA-1481 at 10/14/14 7:09 PM:
-

Hi, Jun, 

Thx that you found a time to look at patch, I've added another regarding your 
suggestion, now in JMX you will see (hope it is what you need):
{code}
kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0
kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic
kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true
{code}

{quote}
Also, your patch doesn't seem to apply to latest trunk.
git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch 
error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or 
directory
{quote}
Interesting...
1. I've worked with 0.8.2 branch, not trunk
2. ClientIdTopic.scala - I've added this file (must be in patch). 

Added 2 equal patches (created them in different way):
1. 'git diff' (KAFKA-1481_2014-10-14_21-53-35.patch)
2. With help from IDEA IDE. (KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch)

Try please, maybe one of them will work

Thx again.



was (Author: vladimir.tretyakov):
Hi, Jun, 

Thx that you found a time to look at patch, I've added another regarding your 
suggestion, now in JMX you will see (hope it is what you need):
{code}
kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,brokerHost=wawanawna,brokerPort=9092,topic=spm_topic,partitionId=0
kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=af_servers,groupId=af_servers,topic=spm_topic
kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchResponseSize,clientId=af_servers,threadName=ConsumerFetcherThread,groupId=af_servers,consumerHostName=wawanawna,timestamp=1413306414731,uuid=4624cb0f,fetcherId=0,sourceBrokerId=0,allBrokers=true
{code}

{quote}
Also, your patch doesn't seem to apply to latest trunk.
git apply ~/Downloads/KAFKA-1481_2014-10-13_18-23-35.patch 
error: core/src/main/scala/kafka/common/ClientIdTopic.scala: No such file or 
directory
{quote}
Interesting...
1. I've worked with 0.8.2 branch, not trunk
2. ClientIdTopic.scala - I've added this file (must be in patch). 

Added 2 equal patches (created them in different way):
1. 'git diff' 
2. With help from IDEA IDE.

Try please, maybe one of them will work

Thx again.


 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
Priority: Critical
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
 KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26663: Patch for KAFKA-979



 On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/log/Log.scala, line 515
  https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515
 
  Although this should work I'm wondering if it would be better to 
  recompute a random jitter for each new rolled segment. What do you think?
 
 Neha Narkhede wrote:
 Only one segment should ever roll at one time in a log. So this should 
 suffice right?
 
 Joel Koshy wrote:
 Yes, but there could be one segment each from multiple low-volume logs 
 that roll simultaneously - which is why we want to add the jitter. The 
 current patch should work - it's just that the jitter is set only once (up 
 front) per log when the log is created. It does not seem there is any 
 randomness after that. That should be okay, but if you are unlucky and get a 
 roll schedule that isn't spread out very well that schedule will stick.

Even for low volume logs, as long as every log as a random jitter, it seems 
unlikely that time based rolling of log segments would align across logs. For 
that sort of alignment, two logs would have to have the same jitter set.


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56494
---


On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 13, 2014, 11:16 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-979 Add optional random jitter for time based log rolling.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

[jira] [Commented] (KAFKA-566) Add last modified time to the TopicMetadataRequest


[ 
https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171531#comment-14171531
 ] 

Neha Narkhede commented on KAFKA-566:
-

[~sriharsha] Thanks for taking this on.

 Add last modified time to the TopicMetadataRequest
 --

 Key: KAFKA-566
 URL: https://issues.apache.org/jira/browse/KAFKA-566
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
 Fix For: 0.9.0


 To support KAFKA-560 it would be nice to have a last modified time in the 
 TopicMetadataRequest. This would be the timestamp of the last append to the 
 log as taken from stat on the final log segment.
 Implementation would involve
 1. Adding a new field to TopicMetadataRequest
 2. Adding a method Log.lastModified: Long to get the last modified time from 
 a log
 This timestamp would, of course, be subject to error in the event that the 
 file was touched without modification, but I think that is actually okay 
 since it provides a manual way to avoid gc'ing a topic that you  know you 
 will want.
 It is debatable whether this should go in 0.8. It would be nice to add the 
 field to the metadata request, at least, as that change should be easy and 
 would avoid needing to bump the version in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-566) Add last modified time to the TopicMetadataRequest


 [ 
https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-566:

Reviewer: Neha Narkhede

 Add last modified time to the TopicMetadataRequest
 --

 Key: KAFKA-566
 URL: https://issues.apache.org/jira/browse/KAFKA-566
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
 Fix For: 0.9.0


 To support KAFKA-560 it would be nice to have a last modified time in the 
 TopicMetadataRequest. This would be the timestamp of the last append to the 
 log as taken from stat on the final log segment.
 Implementation would involve
 1. Adding a new field to TopicMetadataRequest
 2. Adding a method Log.lastModified: Long to get the last modified time from 
 a log
 This timestamp would, of course, be subject to error in the event that the 
 file was touched without modification, but I think that is actually okay 
 since it provides a manual way to avoid gc'ing a topic that you  know you 
 will want.
 It is debatable whether this should go in 0.8. It would be nice to add the 
 field to the metadata request, at least, as that change should be easy and 
 would avoid needing to bump the version in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-560) Garbage Collect obsolete topics


 [ 
https://issues.apache.org/jira/browse/KAFKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-560:

Reviewer: Neha Narkhede

 Garbage Collect obsolete topics
 ---

 Key: KAFKA-560
 URL: https://issues.apache.org/jira/browse/KAFKA-560
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
  Labels: project

 Old junk topics tend to accumulate over time. Code may migrate to use new 
 topics leaving the old ones orphaned. Likewise there are some use cases for 
 temporary transient topics. It would be good to have a tool that could delete 
 any topic that had not been written to in a configurable period of time and 
 had no active consumer groups. Something like
./bin/delete-unused-topics.sh --last-write [date] --zookeeper [zk_connect]
 This requires API support to get the last update time. I think it may be 
 possible to do this through the OffsetRequest now?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26663: Patch for KAFKA-979



 On Oct. 14, 2014, 6:27 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/log/Log.scala, line 515
  https://reviews.apache.org/r/26663/diff/1/?file=719793#file719793line515
 
  Although this should work I'm wondering if it would be better to 
  recompute a random jitter for each new rolled segment. What do you think?
 
 Neha Narkhede wrote:
 Only one segment should ever roll at one time in a log. So this should 
 suffice right?
 
 Joel Koshy wrote:
 Yes, but there could be one segment each from multiple low-volume logs 
 that roll simultaneously - which is why we want to add the jitter. The 
 current patch should work - it's just that the jitter is set only once (up 
 front) per log when the log is created. It does not seem there is any 
 randomness after that. That should be okay, but if you are unlucky and get a 
 roll schedule that isn't spread out very well that schedule will stick.
 
 Neha Narkhede wrote:
 Even for low volume logs, as long as every log as a random jitter, it 
 seems unlikely that time based rolling of log segments would align across 
 logs. For that sort of alignment, two logs would have to have the same jitter 
 set.

Actually, ignore what I said. the config.randomSegmentJitter() actually 
recomputes a random jitter time for every log segment. So no 2 segments should 
reasonably roll at the same time.


- Neha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56494
---


On Oct. 13, 2014, 11:16 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 13, 2014, 11:16 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-979 Add optional random jitter for time based log rolling.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

Review Request 26710: Patch for KAFKA-1637

2014-10-14 Thread Ewen Cheslack-Postava


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26710/
---

Review request for kafka.


Bugs: KAFKA-1637
https://issues.apache.org/jira/browse/KAFKA-1637


Repository: kafka


Description
---

KAFKA-1637 Return correct error code and offsets for OffsetFetchRequest for 
unknown topics/partitions vs no associated consumer.


Diffs
-

  core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 
1586243d20d6a181a1bd9f07e1c9493596005b32 
  core/src/main/scala/kafka/server/KafkaApis.scala 
67f2833804cb15976680e42b9dc49e275c89d266 
  core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
2d9325045ac1ac2d7531161b32c98c847125cbf0 

Diff: https://reviews.apache.org/r/26710/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava

[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group


 [ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1637:
-
Assignee: Ewen Cheslack-Postava
  Status: Patch Available  (was: Open)

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-1637.patch


 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group


 [ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1637:
-
Attachment: KAFKA-1637.patch

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
  Labels: newbie
 Attachments: KAFKA-1637.patch


 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group


[ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171610#comment-14171610
 ] 

Ewen Cheslack-Postava commented on KAFKA-1637:
--

Created reviewboard https://reviews.apache.org/r/26710/diff/
 against branch origin/trunk

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
  Labels: newbie
 Attachments: KAFKA-1637.patch


 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group


[ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171621#comment-14171621
 ] 

Ewen Cheslack-Postava commented on KAFKA-1637:
--

The error code is for UnkownTopicOrPartition, which may have been correct if 
the request was for a non-existent topic or partition. Previously the code 
seemed to be doing the correct thing, reporting this error and returning 
invalid offset when the consumer hadn't started reading from that group. But 
KAFKA-1012 (a670537aa337) actually changed that behavior. The provided patch 
tries to cover the different possible scenarios (missing topic, invalid 
partition, and valid TopicAndPartition but a consumer with no offset for it).

One potential caveat is auto topic creation since it could be reasonable to not 
return UnkownTopicOrPartition for a missing topic in that case. I'm not sure we 
really want different behavior in that case though.

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-1637.patch


 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26663: Patch for KAFKA-979

2014-10-14 Thread Ewen Cheslack-Postava


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/
---

(Updated Oct. 14, 2014, 10:33 p.m.)


Review request for kafka.


Bugs: KAFKA-979
https://issues.apache.org/jira/browse/KAFKA-979


Repository: kafka


Description (updated)
---

Add a new options log.roll.jitter.ms and log.roll.jitter.hours to
add random jitter to time-based log rolling so logs aren't likely to
roll at exactly the same time. Jitter always reduces the timeout so
log.roll.ms still provides a soft maximum. Defaults to 0 so no jitter is
added by default.

Addressing warning and Util.abs comments.


Diffs (updated)
-

  core/src/main/scala/kafka/log/Log.scala 
a123cdc52f341a802b3e4bfeb29a6154332e5f73 
  core/src/main/scala/kafka/log/LogCleaner.scala 
c20de4ad4734c0bd83c5954fdb29464a27b91dff 
  core/src/main/scala/kafka/log/LogConfig.scala 
d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
  core/src/main/scala/kafka/log/LogSegment.scala 
7597d309f37a0b3756381f9500100ef763d466ba 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
7fcbc16da898623b03659c803e2a20c7d1bd1011 
  core/src/main/scala/kafka/server/KafkaServer.scala 
3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
  core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
7b97e6a80753a770ac094e101c653193dec67e68 
  core/src/test/scala/unit/kafka/log/LogTest.scala 
a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 

Diff: https://reviews.apache.org/r/26663/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava

[jira] [Commented] (KAFKA-979) Add jitter for time based rolling


[ 
https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171648#comment-14171648
 ] 

Ewen Cheslack-Postava commented on KAFKA-979:
-

Updated reviewboard https://reviews.apache.org/r/26663/diff/
 against branch origin/trunk

 Add jitter for time based rolling
 -

 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch


 Currently, for low volume topics time based rolling happens at the same time. 
 This causes a lot of IO on a typical cluster and creates back pressure. We 
 need to add a jitter to prevent them from happening at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-979) Add jitter for time based rolling


 [ 
https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-979:

Attachment: KAFKA-979_2014-10-14_15:33:31.patch

 Add jitter for time based rolling
 -

 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch


 Currently, for low volume topics time based rolling happens at the same time. 
 This causes a lot of IO on a typical cluster and creates back pressure. We 
 need to add a jitter to prevent them from happening at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-14 Thread Jun Rao

Bhavesh,

It seems that all those threads are blocked on waiting for the lock on the
dq for that partition. There got to be another thread holding the dq lock
at that point. Could you create a jira and attach the full thread dump
there? Also, could you attach the yourkit result that shows the breakdown
of the time?

Thanks,

Jun

On Tue, Oct 14, 2014 at 10:41 AM, Bhavesh Mistry mistry.p.bhav...@gmail.com
 wrote:

 HI Jay,

 Yes, it is reproducible quite easily.  The problem is synchronized in
 RecordAccumulator.  You can easy produce it.  I have attached the Java code
 in my original email.  Due to Application threads enqueue message into
 single partition is causing thrad contention and application thread may be
 blocked on this for more than a 2 minutes as shown in original email.   Let
 me know if you need more information.

 Last Commit I tested with:

 commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99
 Author: Anton Karamanov atara...@gmail.com
 Date:   Tue Oct 7 18:22:31 2014 -0700

 kafka-1644; Inherit FetchResponse from RequestOrResponse; patched by
 Anton Karamanov; reviewed by Jun Rao

 Thanks,

 Bhavesh

 On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps jay.kr...@gmail.com wrote:

  Hey Bhavesh,
 
  This sounds like a problem. Just to confirm this is after the fix for
  KAFKA-1673?
 
  https://issues.apache.org/jira/browse/KAFKA-1673
 
  It sounds like you have a reproducible test case?
 
  -Jay
 
 
  On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry 
  mistry.p.bhav...@gmail.com
   wrote:
 
   Hi Kafka Dev Team,
  
   When I run the test to send message to single partition for 3 minutes
 or
   so on, I have encounter deadlock (please see the screen attached) and
   thread contention from YourKit profiling.
  
   Use Case:
  
   1)  Aggregating messages into same partition for metric counting.
   2)  Replicate Old Producer behavior for sticking to partition for 3
   minutes.
  
  
   Here is output:
  
   Frozen threads found (potential deadlock)
  
   It seems that the following threads have not changed their stack for
 more
   than 10 seconds.
   These threads are possibly (but not necessarily!) in a deadlock or
 hung.
  
   pool-1-thread-128 --- Frozen for at least 2m
  
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
   byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
   org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
   Callback) KafkaProducer.java:237
   org.kafka.test.TestNetworkDownProducer$MyProducer.run()
   TestNetworkDownProducer.java:84
  
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
   ThreadPoolExecutor.java:1145
   java.util.concurrent.ThreadPoolExecutor$Worker.run()
   ThreadPoolExecutor.java:615
   java.lang.Thread.run() Thread.java:744
  
  
  
   pool-1-thread-159 --- Frozen for at least 2m 1 sec
  
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
   byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
   org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
   Callback) KafkaProducer.java:237
   org.kafka.test.TestNetworkDownProducer$MyProducer.run()
   TestNetworkDownProducer.java:84
  
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
   ThreadPoolExecutor.java:1145
   java.util.concurrent.ThreadPoolExecutor$Worker.run()
   ThreadPoolExecutor.java:615
   java.lang.Thread.run() Thread.java:744
  
  
  
   pool-1-thread-55 --- Frozen for at least 2m
  
 
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
   byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
   org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
   Callback) KafkaProducer.java:237
   org.kafka.test.TestNetworkDownProducer$MyProducer.run()
   TestNetworkDownProducer.java:84
  
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
   ThreadPoolExecutor.java:1145
   java.util.concurrent.ThreadPoolExecutor$Worker.run()
   ThreadPoolExecutor.java:615
   java.lang.Thread.run() Thread.java:744

[jira] [Created] (KAFKA-1705) Add MR layer to Kafka

2014-10-14 Thread Gwen Shapira (JIRA)

Gwen Shapira created KAFKA-1705:
---

 Summary: Add MR layer to Kafka
 Key: KAFKA-1705
 URL: https://issues.apache.org/jira/browse/KAFKA-1705
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Gwen Shapira


Many NoSQL-type storage systems (HBase, Mongo,
Cassandra) and file formats (Avro, Parquet) provide is a MapReduce
integration layer - usually an InputFormat, OutputFormat and a utility
class. Sometimes there's also an abstract Job and Mapper that do more
setup, which can make things even more convenient.

This is different than the existing Hadoop contrib project or Camus in that an 
MR layer will be providing components for use in MR jobs, not an entire job 
that ingests data from Kafka to HDFS.

The benefits I see for a MapReduce layer are:
* Developers can create their own jobs, processing the data as it is
ingested - rather than having to process it in two steps.
* There's reusable components for developers looking to integrate with
Kafka, rather than having everyone implement their own solution.
* Hadoop developers expect projects to have this layer.
* Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark
integration for free.
* There's a layer to plug the delegation token code into and make it
invisible to MapReduce developers. Without this, everyone who writes
MR jobs will need to think about how to implement authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1705) Add MR layer to Kafka

2014-10-14 Thread Gwen Shapira (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171720#comment-14171720
]

Gwen Shapira commented on KAFKA-1705:
-

Looking for community comments on this. Do others see this as a good thing to
add?

Add MR layer to Kafka
-

Key: KAFKA-1705
URL: https://issues.apache.org/jira/browse/KAFKA-1705
Project: Kafka
Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Gwen Shapira

Many NoSQL-type storage systems (HBase, Mongo,
Cassandra) and file formats (Avro, Parquet) provide is a MapReduce
integration layer - usually an InputFormat, OutputFormat and a utility
class. Sometimes there's also an abstract Job and Mapper that do more
setup, which can make things even more convenient.
This is different than the existing Hadoop contrib project or Camus in that
an MR layer will be providing components for use in MR jobs, not an entire
job that ingests data from Kafka to HDFS.
The benefits I see for a MapReduce layer are:
* Developers can create their own jobs, processing the data as it is
ingested - rather than having to process it in two steps.
* There's reusable components for developers looking to integrate with
Kafka, rather than having everyone implement their own solution.
* Hadoop developers expect projects to have this layer.
* Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark
integration for free.
* There's a layer to plug the delegation token code into and make it
invisible to MapReduce developers. Without this, everyone who writes
MR jobs will need to think about how to implement authentication.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-10-14 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171732#comment-14171732
 ] 

Jun Rao commented on KAFKA-1493:


James,

Thanks, got it now. Not sure how long it will take to get a real implementation 
of http://fastcompression.blogspot.com/2013/04/lz4-streaming-format-final.html. 
Should we just take out LZ4 in CompressionType and CompressionCodec in 0.8.2 so 
that people don't use it until it's fixed?

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1493.patch, KAFKA-1493.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-10-14 Thread James Oliver (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171748#comment-14171748
 ] 

James Oliver commented on KAFKA-1493:
-

I implemented the OutputStream today. If I can't get the InputStream done and 
tested before I leave for vacation Thursday, IMO we should take it out.

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1493.patch, KAFKA-1493.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Bad link on the document

2014-10-14 Thread Azuryy Yu

Hi,
Usage information on the hadoop consumer can be found here
https://github.com/linkedin/camus/tree/camus-kafka-0.8/.

the Link is broken, who can kindly fix it, thanks

Re: Review Request 26663: Patch for KAFKA-979


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26663/#review56652
---


Looks good overall - I have one comment on interpreting the jitter value though.


core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/26663/#comment97045

Thinking about this a bit more: do you think it would be safer to interpret 
jitter as an additive value to segmentMs?

i.e., the actual age for rolling will be config.segmentMs + 
segment.rollJitterMs;  (and limit segment.rollJitterMs to an interval of [0, 
config.segmentMs] which you are already doing.)

Otherwise if a user happens to set a high jitter time then nearly empty 
segments roll often (with high probability).

Another way to interpret it is as a jitter window. i.e., the actual age for 
rolling will be config.segmentMs + segment.rollJitterMs; and limit 
segment.rollJitterMs to an interval of [-config.segmentMs / 2, config.segmentMs 
/ 2]

Thoughts?


- Joel Koshy


On Oct. 14, 2014, 10:33 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26663/
 ---
 
 (Updated Oct. 14, 2014, 10:33 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-979
 https://issues.apache.org/jira/browse/KAFKA-979
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Add a new options log.roll.jitter.ms and log.roll.jitter.hours to
 add random jitter to time-based log rolling so logs aren't likely to
 roll at exactly the same time. Jitter always reduces the timeout so
 log.roll.ms still provides a soft maximum. Defaults to 0 so no jitter is
 added by default.
 
 Addressing warning and Util.abs comments.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/Log.scala 
 a123cdc52f341a802b3e4bfeb29a6154332e5f73 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/main/scala/kafka/log/LogConfig.scala 
 d2cc9e3d6b7a4fd24516d164eb3673e6ce052129 
   core/src/main/scala/kafka/log/LogSegment.scala 
 7597d309f37a0b3756381f9500100ef763d466ba 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 7fcbc16da898623b03659c803e2a20c7d1bd1011 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 3e9e91f2b456bbdeb3055d571e18ffea8675b4bf 
   core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
 7b97e6a80753a770ac094e101c653193dec67e68 
   core/src/test/scala/unit/kafka/log/LogTest.scala 
 a0cbd3bbbeeabae12caa6b41aec31a8f5dfd034b 
 
 Diff: https://reviews.apache.org/r/26663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

[jira] [Updated] (KAFKA-979) Add jitter for time based rolling


 [ 
https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-979:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk and 0.8.2

 Add jitter for time based rolling
 -

 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch


 Currently, for low volume topics time based rolling happens at the same time. 
 This causes a lot of IO on a typical cluster and creates back pressure. We 
 need to add a jitter to prevent them from happening at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-979) Add jitter for time based rolling

2014-10-14 Thread Joel Koshy (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171962#comment-14171962
 ] 

Joel Koshy commented on KAFKA-979:
--

Just had one more comment in the RB on how to use the jitter config. Can you 
take a quick look at that?

 Add jitter for time based rolling
 -

 Key: KAFKA-979
 URL: https://issues.apache.org/jira/browse/KAFKA-979
 Project: Kafka
  Issue Type: Bug
Reporter: Sriram Subramanian
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-979.patch, KAFKA-979_2014-10-14_15:33:31.patch


 Currently, for low volume topics time based rolling happens at the same time. 
 This causes a lot of IO on a typical cluster and creates back pressure. We 
 need to add a jitter to prevent them from happening at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26710: Patch for KAFKA-1637


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26710/#review56655
---

Ship it!


Ship It!

- Neha Narkhede


On Oct. 14, 2014, 10:04 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26710/
 ---
 
 (Updated Oct. 14, 2014, 10:04 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1637
 https://issues.apache.org/jira/browse/KAFKA-1637
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1637 Return correct error code and offsets for OffsetFetchRequest for 
 unknown topics/partitions vs no associated consumer.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 
 1586243d20d6a181a1bd9f07e1c9493596005b32 
   core/src/main/scala/kafka/server/KafkaApis.scala 
 67f2833804cb15976680e42b9dc49e275c89d266 
   core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
 2d9325045ac1ac2d7531161b32c98c847125cbf0 
 
 Diff: https://reviews.apache.org/r/26710/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava

[jira] [Updated] (KAFKA-1698) Validator.ensureValid() only validates default config value


 [ 
https://issues.apache.org/jira/browse/KAFKA-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1698:
-
Reviewer: Jun Rao

 Validator.ensureValid() only validates default config value
 ---

 Key: KAFKA-1698
 URL: https://issues.apache.org/jira/browse/KAFKA-1698
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Ewen Cheslack-Postava
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1698.patch


 We should use it to validate the actual configured value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group


 [ 
https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1637:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk and 0.8.2

 SimpleConsumer.fetchOffset returns wrong error code when no offset exists for 
 topic/partition/consumer group
 

 Key: KAFKA-1637
 URL: https://issues.apache.org/jira/browse/KAFKA-1637
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Affects Versions: 0.8.1, 0.8.1.1
 Environment: Linux
Reporter: Amir Malekpour
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-1637.patch


 This concerns Kafka's Offset  Fetch API:
 According to Kafka's current documentation, if there is no offset associated 
 with a topic-partition under that consumer group the broker does not set an 
 error code (since it is not really an error), but returns empty metadata and 
 sets the offset field to -1.  (Link below)
 However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes 
 it impossible for the client to decide if there was an error, or if there is 
 no offset associated with a topic-partition under that consumer group.
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1583) Kafka API Refactoring

2014-10-14 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171998#comment-14171998
 ] 

Guozhang Wang commented on KAFKA-1583:
--

Bump, [~junrao] could you take a look now?

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26710: Patch for KAFKA-1637