[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327 ] Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:43 AM: - Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though. was (Author: liqusha): ~ Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism but kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327 ] Ahmet AKYOL commented on KAFKA-1932: ~ Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism but kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327 ] Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:48 AM: - Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though, but many. Since [~jkreps] is also at Confluent, I guess this type administrative features will mostly be done by Confluent which I'am ok with. So, this issue can be closed. was (Author: liqusha): Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though. kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327 ] Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 8:41 AM: - Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/schema-registry/docs/api.html http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though, but it's the administrative client Kafka missing with other features. And I guess, since [~jkreps] is also at Confluent, this type administrative features will mostly be done by Confluent which I'am ok with it. So, this issue can be closed. was (Author: liqusha): Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though, but it's the administrative client Kafka missing with other features. And I guess, since [~jkreps] is also at Confluent, this type administrative features will mostly be done by Confluent which I'am ok with it. So, this issue can be closed. kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327 ] Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:52 AM: - Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though, but it's the administrative client Kafka missing with other features. And I guess, since [~jkreps] is also at Confluent, this type administrative features will mostly be done by Confluent which I'am ok with it. So, this issue can be closed. was (Author: liqusha): Third party solution, rest proxy from Confluent : http://confluent.io/docs/current/kafka-rest/docs/intro.html It doesn't provide a template mechanism though, but many. Since [~jkreps] is also at Confluent, I guess this type administrative features will mostly be done by Confluent which I'am ok with. So, this issue can be closed. kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: zookeeper usage in kafka offset commit / fetch requests
Just to close this, I found out that the broker handles the request different depending on whether you specify api version 0 or 1. If using V1 the broker commits the offsets to internal topic. /svante
Re: [VOTE] KIP-7 Security - IP Filtering
I think it is acceptable for KIP-11 to allow additional network based authorization. However I do feel that most user will want something simpler and thus I don't feel KIP-11 should require network based authorization by default. For example postgres allows something similar to what is being discussed here. In addition to user based authz of database objects, administrators can use the pg_hba.conf configuration file to perform network based authz for users and groups. Despite this, few postgres installs with more than one user, use the pg_hba.conf configuration for user based network authz. They do, however, use standard user based authz for database objects and pg_hba.conf for basic database-wide network authz. tl;dr - I think we should implement both KIP-7 and KIP-11. On Fri, Mar 20, 2015 at 7:23 PM, Gwen Shapira gshap...@cloudera.com wrote: I'd like to add that HDFS has had ACLs + RBAC + global IP white/black list for years now. We did not notice any customers confusing the features. I've seen customers use each feature for different purposes. Actually, the only system I am aware of that integrated IP access controls together with RBAC and ACLs is MySQL, where it leads to major confusion among new admins (I can log in as gwenshap from a remote machine but not from localhost or vice-versa...). Gwen On Fri, Mar 20, 2015 at 4:04 PM, Jeff Holoman jholo...@cloudera.com wrote: Parth, I think it's important to understand the timing of both the initial JIRA and the KIP, it helps put my comments in proper context. The initial JIRA for this was created back in December, so the timeline for 1688/KIP-11 was pretty unclear. KIP-7 came out when we started doing KIPs, back in Jan. The initial comments I think pretty clearly acknowledged the work on 1688. My comment re: integration was that perhaps the check logic could be reused (i.e, the CIDR range checks). That's what I meant when I mentioned the intent. At that point there was no KIP-11 and no patch, so it was just a hunch. In terms of it being a different use case, I do think there are some aspects which would be beneficial, even given the work on 1688. 1) Simplicity 2) Less config - 2 params 3) Allows for both whitelisting and blacklisting, giving a bit more flexibility. Another key driver, at least at the time, was timing. As this discussion has been extended that becomes less of a motivator, which I understand. I don't necessarily think that giving users options to limit access will confuse them, given the new interface for security will likely take a bit of understanding to implement correctly. In fact they might be quite complimentary. Lets take the 2nd example in KIP 11: user2.hosts=* user2.operation=read So if I understand correctly, this would allow read access for a particular topic from any host for a given user. it could be helpful to block a range of IPs (like production or QA) but otherwise not specify every potential host that needs to read from that topic. As you mentioned adding additional constructs make the ACL's a bit more complex, maybe this provides some relief there. Jun, if folks feel like there's too much overlap, and this wouldn't be useful, then that's fair. That's what the votes are for right? :) Thanks Jeff On Fri, Mar 20, 2015 at 6:01 PM, Parth Brahmbhatt pbrahmbh...@hortonworks.com wrote: I am guessing in your last reply you meant KIP-11. And yes, I think KIP-11 subsumed KIP-7 so if we can finish KIP-11 we should not need KIP=7 but I will let Jeff confirm that, Thanks Parth On 3/20/15, 2:32 PM, Jun Rao j...@confluent.io wrote: Right, if this KIP is subsumed by KIP-7, perhaps we just need to wait until KIP-7 is done? If we add the small change now, we will have to worry about migrating existing users and deprecating some configs when KIP-7 is done. Thanks, Jun On Fri, Mar 20, 2015 at 10:36 AM, Parth Brahmbhatt pbrahmbh...@hortonworks.com wrote: I am not entirely sure what you mean by integrating KIP-7 work with KAFKA-1688. Wouldn¹t the work done as part of KIP-7 become obsolete once KAFKA-1688 is done? Multiple ways of controlling these authorization just seems extra configuration that will confuse admins/users. If timing is the only issue don¹t you think its better to focus our energy on getting 1688 done faster which seem to be the longer term goal anyways? Thanks Parth On 3/20/15, 10:28 AM, Jeff Holoman jholo...@cloudera.com wrote: Hey Jun, The intent was for the same functionality to be utilized when 1688 is done, as mentioned in the KIP: The broader security initiative http://kafka-1682/ will add more robust controls for these types of environments, and this proposal could be integrated with that work at the appropriate time. This is also the specific request of a large financial services company. I don't think
[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work
[ https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386942#comment-14386942 ] Jay Kreps commented on KAFKA-2046: -- Hmm, so we have a configuration which, if you set it to anything other than the default, causes a deadlock? Is there a reason we left this configurable once we realized infinite was the only safe value? Could we remove it now? :-) Delete topic still doesn't work --- Key: KAFKA-2046 URL: https://issues.apache.org/jira/browse/KAFKA-2046 Project: Kafka Issue Type: Bug Reporter: Clark Haskins Assignee: Onur Karaman I just attempted to delete at 128 partition topic with all inbound producers stopped. The result was as follows: The /admin/delete_topics znode was empty the topic under /brokers/topics was removed The Kafka topics command showed that the topic was removed However, the data on disk on each broker was not deleted and the topic has not yet been re-created by starting up the inbound mirror maker. Let me know what additional information is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1932) kafka topic (creation) templates
[ https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin resolved KAFKA-1932. - Resolution: Duplicate [~liqusha] The requirement looks also covered in KIP-4. Close for now. Please reopen if I miss something. kafka topic (creation) templates Key: KAFKA-1932 URL: https://issues.apache.org/jira/browse/KAFKA-1932 Project: Kafka Issue Type: Wish Reporter: Ahmet AKYOL AFAIK, the only way to create a Kafka topic (without using the default settings) is using the provided bash script. Even though, a client support could be nice, I would prefer to see a template mechanism similar to [Elasticsearch Index Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html] . What I have in my mind is very simple and adding something like this into server properties : template.name=pattern,numOfReplica,NumberOfPartition and pattern can only contain * meaning starts with, ends with or contains. example: template.logtopics=*_log,2,20 template.loaders=*_loader,1,5 so,when some producer sends a message to a topic for the first time which ends with _logs , then, kafka can use above settings. thanks in advance update: On second thought, maybe a command like kafka-create-template.sh could be more practical for cluster deployments, rather than adding to server.properties. Kafka internally registers this to ZK. About use cases, I can understand an opposing argument like creating many topics is not a good design decision. Besides, my point is not to create so many topics, just to automate an important process by giving the responsibility to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2076) Add an API to new consumer to allow user get high watermark of partitions.
Jiangjie Qin created KAFKA-2076: --- Summary: Add an API to new consumer to allow user get high watermark of partitions. Key: KAFKA-2076 URL: https://issues.apache.org/jira/browse/KAFKA-2076 Project: Kafka Issue Type: Improvement Reporter: Jiangjie Qin We have a use case that user wants to know how far it is behind a particular partition on startup. Currently in each fetch response, we have high watermark for each partition, we only keep a global max-lag metric. It would be better that we keep a record of high watermark per partition and update it on each fetch response. We can add a new API to let user query the high watermark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Can I be added as a contributor?
Hi, Could I be added as a contributor and to confluence? I am brocknoland on JIRA and brockn at gmail on confluence. Cheers! Brock
Re: [DISCUSSION] Keep docs updated per jira
Yeah that sounds good to me. That would involve someone converting us over to the templating system we chose (something like Jenkyll would be easy since it allows inline HTML, so we'd just have to replace the apache includes). It's also quite possible that Apache has solved the whole SVN/website dependency. I get the feeling git is no longer as taboo as it once was in Apache... -Jay On Sat, Mar 28, 2015 at 9:46 PM, Gwen Shapira gshap...@cloudera.com wrote: On Thu, Mar 26, 2015 at 9:46 PM, Jay Kreps jay.kr...@gmail.com wrote: The reason the docs are in svn is that when we were setting up the site apache required that to publish doc changes. Two possible fixes: 1. Follow up with infra to see if they have git integration working yet 2. Move to a model where doc source is kept in the main git and we use jenkyl or something like that to generate result html (i.e. with things like headers) and then check that in to svn to publish it. The second item would have the advantage of not needing to configure apache includes to see the docs, but would likely trade it for jenkyl setup stuff. Jenkyl might actually fix a lot of the repetitive stuff in the docs today (e.g. generating section numbers, adding p tags, etc). What we do at other projects that I'm familiar with (Sqoop, Flume) is that we manage the docs source (i.e. asciidoc or similar) in git. When its time to release a version (i.e. after a successful vote on RC), we build the docs, manually copy to the SVN repo and push (or whatever the SVN equivalent...). Its a bit of a manual pain, but its only few times a year. This is also a good opportunity to upload updated javadocs, docs generated from ConfigDef and KafkaMetrics, etc. Gwen -Jay On Thu, Mar 26, 2015 at 8:22 PM, Jiangjie Qin j...@linkedin.com.invalid wrote: On 3/26/15, 7:00 PM, Neha Narkhede n...@confluent.io wrote: Much much easier to do this if the docs are in git and can be reviewed and committed / reverted with the code (transactions makes synchronization easier...). +1 on this, too! Huge +1. On Thu, Mar 26, 2015 at 6:54 PM, Joel Koshy jjkosh...@gmail.com wrote: +1 It is indeed too easy to forget and realize only much later that a jira needed a doc update. So getting into the habit of asking did you update the docs as part of review will definitely help. On Thu, Mar 26, 2015 at 06:36:43PM -0700, Gwen Shapira wrote: I strongly support the goal of keeping docs and code in sync. Much much easier to do this if the docs are in git and can be reviewed and committed / reverted with the code (transactions makes synchronization easier...). This will also allow us to: 1. Include the docs in the bits we release 2. On release, update the website with the docs from the specific branch that was just released 3. Hook our build to ReadTheDocs and update the trunk docs with every commit Tons of Apache projects do this already and having reviews enforce the did you update the docs before committing is the best way to guarantee updated docs. Gwen On Thu, Mar 26, 2015 at 6:27 PM, Jun Rao j...@confluent.io wrote: Hi, Everyone, Quite a few jiras these days require documentation changes (e.g., wire protocol, ZK layout, configs, jmx, etc). Historically, we have been updating the documentation just before we do a release. The issue is that some of the changes will be missed since they were done a while back. Another way to do that is to keep the docs updated as we complete each jira. Currently, our documentations are in the following places. wire protocol: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pr otocol ZK layout: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+i n+Zookeeper configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083 We probably don't need to update configs already ported to ConfigDef since they can be generated automatically. However, for the rest of the doc related changes, keeping they updated per jira seems a better approach. What do people think? Thanks, Jun -- Thanks, Neha
Kafka confusion
Gwen had a really good blog post on confusing things about Kafka: http://ingest.tips/2015/03/26/what-is-confusing-about-kafka/ A lot of these are in-flight now (e.g. consumer) or finished (e.g. delete topic, kind of, and non-sticky producer partitioning). Do we have bugs for the rest? It would be great to make this an ongoing thing we do to assess confusing configs, etc. Not sure the best way, to do this, but maybe just maintaining a special JIRA tag and periodically polling people again? -Jay
Re: Kafka confusion
I was planning on doing a re-poll about a month after every release :) Maybe it can be part of the release activity. Gwen On Mon, Mar 30, 2015 at 10:26 AM, Jay Kreps j...@confluent.io wrote: Gwen had a really good blog post on confusing things about Kafka: http://ingest.tips/2015/03/26/what-is-confusing-about-kafka/ A lot of these are in-flight now (e.g. consumer) or finished (e.g. delete topic, kind of, and non-sticky producer partitioning). Do we have bugs for the rest? It would be great to make this an ongoing thing we do to assess confusing configs, etc. Not sure the best way, to do this, but maybe just maintaining a special JIRA tag and periodically polling people again? -Jay
[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)
[ https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387303#comment-14387303 ] Aaron Dixon commented on KAFKA-873: --- I've recently worked zkclient bridge to get a new release out so that I could integrate curator with Kafka. I have a pull request coming soon to the Kafka project with this integration. Consider replacing zkclient with curator (with zkclient-bridge) --- Key: KAFKA-873 URL: https://issues.apache.org/jira/browse/KAFKA-873 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Scott Clasen Assignee: Grant Henke If zkclient was replaced with curator and curator-x-zkclient-bridge it would be initially a drop-in replacement https://github.com/Netflix/curator/wiki/ZKClient-Bridge With the addition of a few more props to ZkConfig, and a bit of code this would open up the possibility of using ACLs in zookeeper (which arent supported directly by zkclient), as well as integrating with netflix exhibitor for those of us using that. Looks like KafkaZookeeperClient needs some love anyhow... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Issue Comment Deleted] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector
You can take a look at this : https://issues.apache.org/jira/browse/KAFKA-1848 Thanks, Mayuresh On Sun, Mar 29, 2015 at 4:45 PM, Jiangjie Qin (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiangjie Qin updated KAFKA-1716: Comment: was deleted (was: [~dchu] Do you mean that the fetchers have never been created? That's a good point, but I still do not totally understand the cause. The first rebalance of ZookeeperConsumeConnector occurs when KafkaStreams are created. That means you need to specify a topic count map and create streams. So leader finder thread will send TopicMetadataRequest to brokers to get back the topic metadata for the topic. By default auto topic creation is enabled on Kafka brokers. That means when broker saw a TopicMetadataRequest asking for a topic that does not exist yet, it will created it and return the topic metadata. So the consumer fetcher thread will be created for the topic on ZookeeperConsumerConnector. However, if auto topic creation is turned off, your description looks possible. About the shutdown issue. You are right, that is an issue that has been fixed in KAFKA-1848, but seems not included in 0.8.2. I just changed the fix version from 0.9.0 to 0.8.3 instead.) hang during shutdown of ZookeeperConsumerConnector -- Key: KAFKA-1716 URL: https://issues.apache.org/jira/browse/KAFKA-1716 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.1.1 Reporter: Sean Fay Assignee: Neha Narkhede Attachments: after-shutdown.log, before-shutdown.log, kafka-shutdown-stuck.log It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to wedge in the case that some consumer fetcher threads receive messages during the shutdown process. Shutdown thread: {code}-- Parking to wait for: java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0 at jrockit/vm/Locks.park0(J)V(Native Method) at jrockit/vm/Locks.park(Locks.java:2230) at sun/misc/Unsafe.park(ZJ)V(Native Method) at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156) at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207) at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36) at kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120) at scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226) at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39) at scala/collection/mutable/HashMap.foreach(HashMap.scala:98) at scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120) ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock] at kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148) at kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171) at kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code} ConsumerFetcherThread: {code}-- Parking to wait for: java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568 at jrockit/vm/Locks.park0(J)V(Native Method) at jrockit/vm/Locks.park(Locks.java:2230) at sun/misc/Unsafe.park(ZJ)V(Native Method) at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156) at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306) at kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) at
[jira] [Commented] (KAFKA-1809) Refactor brokers to allow listening on multiple ports and IPs
[ https://issues.apache.org/jira/browse/KAFKA-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387348#comment-14387348 ] Joel Koshy commented on KAFKA-1809: --- Thanks for the ping - yes I would like to take a look but will only be able to get to it by mid-week. If it is checked-in, I can review after. Refactor brokers to allow listening on multiple ports and IPs -- Key: KAFKA-1809 URL: https://issues.apache.org/jira/browse/KAFKA-1809 Project: Kafka Issue Type: Sub-task Components: security Reporter: Gwen Shapira Assignee: Gwen Shapira Attachments: KAFKA-1809.patch, KAFKA-1809.v1.patch, KAFKA-1809.v2.patch, KAFKA-1809.v3.patch, KAFKA-1809_2014-12-25_11:04:17.patch, KAFKA-1809_2014-12-27_12:03:35.patch, KAFKA-1809_2014-12-27_12:22:56.patch, KAFKA-1809_2014-12-29_12:11:36.patch, KAFKA-1809_2014-12-30_11:25:29.patch, KAFKA-1809_2014-12-30_18:48:46.patch, KAFKA-1809_2015-01-05_14:23:57.patch, KAFKA-1809_2015-01-05_20:25:15.patch, KAFKA-1809_2015-01-05_21:40:14.patch, KAFKA-1809_2015-01-06_11:46:22.patch, KAFKA-1809_2015-01-13_18:16:21.patch, KAFKA-1809_2015-01-28_10:26:22.patch, KAFKA-1809_2015-02-02_11:55:16.patch, KAFKA-1809_2015-02-03_10:45:31.patch, KAFKA-1809_2015-02-03_10:46:42.patch, KAFKA-1809_2015-02-03_10:52:36.patch, KAFKA-1809_2015-03-16_09:02:18.patch, KAFKA-1809_2015-03-16_09:40:49.patch, KAFKA-1809_2015-03-27_15:04:10.patch The goal is to eventually support different security mechanisms on different ports. Currently brokers are defined as host+port pair, and this definition exists throughout the code-base, therefore some refactoring is needed to support multiple ports for a single broker. The detailed design is here: https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender
Benoy Antony created KAFKA-2077: --- Summary: Add ability to specify a TopicPicker class for KafkaLog4jApender Key: KAFKA-2077 URL: https://issues.apache.org/jira/browse/KAFKA-2077 Project: Kafka Issue Type: Improvement Components: producer Reporter: Benoy Antony Assignee: Jun Rao KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. Currently , a topic name has to be passed as a parameter. In some use cases, it may be required to use a different topics for the same appender instance. So it may be beneficial to enable KafkaLog4jAppender to accept TopicClass which will provide a topic for a given message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
[ https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612 ] Aravind edited comment on KAFKA-2078 at 3/30/15 11:41 PM: -- Hi, thanks for replying. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: # Server Basics # # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # Socket Server Settings # # The port the socket server listens on port=9092 # Hostname the broker will bind to. If not set, the server will bind to all interfaces #host.name=localhost # Hostname the broker will advertise to producers and consumers. If not set, it uses the # value for host.name if configured. Otherwise, it will use the value returned from # java.net.InetAddress.getCanonicalHostName(). #advertised.host.name=hostname routable by clients # The port to publish to ZooKeeper for clients to use. If this is not set, # it will publish the same port that the broker binds to. #advertised.port=port accessible by clients # The number of threads handling network requests num.network.threads=3 # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 # Log Basics # # A comma seperated list of directories under which to store log files log.dirs=/data/kafka-logs # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 # Log Flush Policy # # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: #1. Durability: Unflushed data may be lost if you are not using replication. #2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. #3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=1 # The maximum amount of time a message can sit in a log before we force a flush #log.flush.interval.ms=1000 # Log Retention Policy # # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. # The minimum age of a log file to be eligible for deletion log.retention.hours=1 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. #log.retention.bytes=1073741824 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=30 # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires. # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction. log.cleaner.enable=false # Zookeeper # # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each
[jira] [Commented] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
[ https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612 ] Aravind commented on KAFKA-2078: Hi, thanks for replying. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer # Max size of file that can be sent by a producer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: # Server Basics # # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # Socket Server Settings # # The port the socket server listens on port=9092 # Hostname the broker will bind to. If not set, the server will bind to all interfaces #host.name=localhost # Hostname the broker will advertise to producers and consumers. If not set, it uses the # value for host.name if configured. Otherwise, it will use the value returned from # java.net.InetAddress.getCanonicalHostName(). #advertised.host.name=hostname routable by clients # The port to publish to ZooKeeper for clients to use. If this is not set, # it will publish the same port that the broker binds to. #advertised.port=port accessible by clients # The number of threads handling network requests num.network.threads=3 # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 # Log Basics # # A comma seperated list of directories under which to store log files log.dirs=/data/kafka-logs # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 # Log Flush Policy # # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: #1. Durability: Unflushed data may be lost if you are not using replication. #2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. #3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=1 # The maximum amount of time a message can sit in a log before we force a flush #log.flush.interval.ms=1000 # Log Retention Policy # # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. # The minimum age of a log file to be eligible for deletion log.retention.hours=1 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. #log.retention.bytes=1073741824 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=30 # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires. # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction. log.cleaner.enable=false # Zookeeper # # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each
[jira] [Commented] (KAFKA-2068) Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent
[ https://issues.apache.org/jira/browse/KAFKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387360#comment-14387360 ] Joel Koshy commented on KAFKA-2068: --- Yes I think we can merge KAFKA-1841 changes from 0.8.2 into trunk as part of this jira. Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent Key: KAFKA-2068 URL: https://issues.apache.org/jira/browse/KAFKA-2068 Project: Kafka Issue Type: Sub-task Reporter: Gwen Shapira Fix For: 0.8.3 Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender
[ https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387407#comment-14387407 ] Benoy Antony commented on KAFKA-2041: - modified _Keyer_ interface to change the function name. Add ability to specify a KeyClass for KafkaLog4jAppender Key: KAFKA-2041 URL: https://issues.apache.org/jira/browse/KAFKA-2041 Project: Kafka Issue Type: Improvement Components: producer Reporter: Benoy Antony Assignee: Jun Rao Attachments: kafka-2041-001.patch, kafka-2041-002.patch KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. Since there is no key or explicit partition number, the messages are sent to random partitions. In some cases, it is possible to derive a key from the message itself. So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which will provide a key for a given message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender
[ https://issues.apache.org/jira/browse/KAFKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated KAFKA-2077: Attachment: kafka-2077-001.patch Attached patch adds a new new Parameter _topicClass_ . This will accept an implementation of _TopicPicker_ interface. The _TopicPicker.getTopic_ will return a topic based on the current message. Add ability to specify a TopicPicker class for KafkaLog4jApender Key: KAFKA-2077 URL: https://issues.apache.org/jira/browse/KAFKA-2077 Project: Kafka Issue Type: Improvement Components: producer Reporter: Benoy Antony Assignee: Jun Rao Attachments: kafka-2077-001.patch KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. Currently , a topic name has to be passed as a parameter. In some use cases, it may be required to use a different topics for the same appender instance. So it may be beneficial to enable KafkaLog4jAppender to accept TopicClass which will provide a topic for a given message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
[ https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387551#comment-14387551 ] Sriharsha Chintalapani commented on KAFKA-2078: --- [~aravind2015] can you attach your server.properties and producer config. Getting Selector [WARN] Error in I/O with host java.io.EOFException --- Key: KAFKA-2078 URL: https://issues.apache.org/jira/browse/KAFKA-2078 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.0 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x Intel(R) Xeon(R) CPU X5660 @ 2.80GHz/44GB Reporter: Aravind Assignee: Jun Rao When trying to Produce 1000 (10 MB) messages, getting this below error some where between 997 to 1000th message. There is no pattern but able to reproduce. [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at java.lang.Thread.run(Thread.java:724) This error I am getting some times @ 997th message or 999th message. There is no pattern but able to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)
[ https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387573#comment-14387573 ] ASF GitHub Bot commented on KAFKA-873: -- GitHub user atdixon opened a pull request: https://github.com/apache/kafka/pull/53 curator + exhibitor integration My motivation for introducing curator to kafka was to get optional exhibitor support, however I noticed this is also a solution to ticket KAFKA-873 (https://issues.apache.org/jira/browse/KAFKA-873). Structurally I believe the code is sound, however some tests are blocking which I believe is duet o races related to in-memory Zookeeper but not entirely sure. Am looking into it and testing outside of in-memory ZK, as well. But would love comments/discussion on this PR. I imagine exhibitor support is something that many are interested in, especially those of us in AWS cloud environments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/atdixon/kafka exhibitor-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/53.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #53 commit 3526d5664fa87b2d3f0e6e35bd5b060639df4ead Author: Aaron Dixon atdi...@gmail.com Date: 2015-03-30T23:15:16Z curator + exhibitor integration Consider replacing zkclient with curator (with zkclient-bridge) --- Key: KAFKA-873 URL: https://issues.apache.org/jira/browse/KAFKA-873 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Scott Clasen Assignee: Grant Henke If zkclient was replaced with curator and curator-x-zkclient-bridge it would be initially a drop-in replacement https://github.com/Netflix/curator/wiki/ZKClient-Bridge With the addition of a few more props to ZkConfig, and a bit of code this would open up the possibility of using ACLs in zookeeper (which arent supported directly by zkclient), as well as integrating with netflix exhibitor for those of us using that. Looks like KafkaZookeeperClient needs some love anyhow... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] kafka pull request: curator + exhibitor integration
GitHub user atdixon opened a pull request: https://github.com/apache/kafka/pull/53 curator + exhibitor integration My motivation for introducing curator to kafka was to get optional exhibitor support, however I noticed this is also a solution to ticket KAFKA-873 (https://issues.apache.org/jira/browse/KAFKA-873). Structurally I believe the code is sound, however some tests are blocking which I believe is duet o races related to in-memory Zookeeper but not entirely sure. Am looking into it and testing outside of in-memory ZK, as well. But would love comments/discussion on this PR. I imagine exhibitor support is something that many are interested in, especially those of us in AWS cloud environments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/atdixon/kafka exhibitor-support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/53.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #53 commit 3526d5664fa87b2d3f0e6e35bd5b060639df4ead Author: Aaron Dixon atdi...@gmail.com Date: 2015-03-30T23:15:16Z curator + exhibitor integration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation
[ https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy resolved KAFKA-1634. --- Resolution: Fixed Yes I think that makes more sense Improve semantics of timestamp in OffsetCommitRequests and update documentation --- Key: KAFKA-1634 URL: https://issues.apache.org/jira/browse/KAFKA-1634 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Guozhang Wang Priority: Blocker Fix For: 0.8.3 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, KAFKA-1634_2014-12-01_18:03:12.patch, KAFKA-1634_2015-01-14_15:50:15.patch, KAFKA-1634_2015-01-21_16:43:01.patch, KAFKA-1634_2015-01-22_18:47:37.patch, KAFKA-1634_2015-01-23_16:06:07.patch, KAFKA-1634_2015-02-06_11:01:08.patch, KAFKA-1634_2015-03-26_12:16:09.patch, KAFKA-1634_2015-03-26_12:27:18.patch From the mailing list - following up on this -- I think the online API docs for OffsetCommitRequest still incorrectly refer to client-side timestamps: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest Wasn't that removed and now always handled server-side now? Would one of the devs mind updating the API spec wiki? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)
[ https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387303#comment-14387303 ] Aaron Dixon edited comment on KAFKA-873 at 3/30/15 8:48 PM: I recently worked with the curator-x-zkclient-bridge team to get a new release (3.0.0) out so that I could integrate curator with Kafka. I have a pull request coming soon to the Kafka project with this integration. was (Author: atdixon): I've recently worked zkclient bridge to get a new release out so that I could integrate curator with Kafka. I have a pull request coming soon to the Kafka project with this integration. Consider replacing zkclient with curator (with zkclient-bridge) --- Key: KAFKA-873 URL: https://issues.apache.org/jira/browse/KAFKA-873 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Scott Clasen Assignee: Grant Henke If zkclient was replaced with curator and curator-x-zkclient-bridge it would be initially a drop-in replacement https://github.com/Netflix/curator/wiki/ZKClient-Bridge With the addition of a few more props to ZkConfig, and a bit of code this would open up the possibility of using ACLs in zookeeper (which arent supported directly by zkclient), as well as integrating with netflix exhibitor for those of us using that. Looks like KafkaZookeeperClient needs some love anyhow... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
Aravind created KAFKA-2078: -- Summary: Getting Selector [WARN] Error in I/O with host java.io.EOFException Key: KAFKA-2078 URL: https://issues.apache.org/jira/browse/KAFKA-2078 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.0 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x Intel(R) Xeon(R) CPU X5660 @ 2.80GHz/44GB Reporter: Aravind Assignee: Jun Rao When trying to Produce 1000 (10 MB) messages, getting this below error some where between 997 to 1000th message. There is no pattern but able to reproduce. [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at java.lang.Thread.run(Thread.java:724) This error I am getting some times @ 997th message or 999th message. There is no pattern but able to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)
[ https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387578#comment-14387578 ] Aaron Dixon commented on KAFKA-873: --- As promised, here is a PR integrating curator (+ optional exhibitor support) into kafka... https://github.com/apache/kafka/pull/53. Some of the automated tests are hanging, I believe due to races related to embedded zookeeper usage but am not entirely sure. Consider replacing zkclient with curator (with zkclient-bridge) --- Key: KAFKA-873 URL: https://issues.apache.org/jira/browse/KAFKA-873 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Scott Clasen Assignee: Grant Henke If zkclient was replaced with curator and curator-x-zkclient-bridge it would be initially a drop-in replacement https://github.com/Netflix/curator/wiki/ZKClient-Bridge With the addition of a few more props to ZkConfig, and a bit of code this would open up the possibility of using ACLs in zookeeper (which arent supported directly by zkclient), as well as integrating with netflix exhibitor for those of us using that. Looks like KafkaZookeeperClient needs some love anyhow... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
[ https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612 ] Aravind edited comment on KAFKA-2078 at 3/30/15 11:44 PM: -- Hi, thanks for replying. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: broker.id=0 port=9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/data/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=30 log.cleaner.enable=false zookeeper.connect=host1:2181 zookeeper.connection.timeout.ms=6000 message.max.bytes=104857600 replica.fetch.max.bytes=104857600 Thanks, Aravind was (Author: aravind2015): Hi, thanks for replying. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: # Server Basics # # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # Socket Server Settings # # The port the socket server listens on port=9092 # Hostname the broker will bind to. If not set, the server will bind to all interfaces #host.name=localhost # Hostname the broker will advertise to producers and consumers. If not set, it uses the # value for host.name if configured. Otherwise, it will use the value returned from # java.net.InetAddress.getCanonicalHostName(). #advertised.host.name=hostname routable by clients # The port to publish to ZooKeeper for clients to use. If this is not set, # it will publish the same port that the broker binds to. #advertised.port=port accessible by clients # The number of threads handling network requests num.network.threads=3 # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 # Log Basics # # A comma seperated list of directories under which to store log files log.dirs=/data/kafka-logs # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 # Log Flush Policy # # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: #1. Durability: Unflushed data may be lost if you are not using replication. #2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. #3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=1 # The maximum amount of time a message can sit in a log before we force a flush #log.flush.interval.ms=1000 # Log Retention Policy # # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. # The minimum age of a log file to be eligible for deletion log.retention.hours=1 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below
[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException
[ https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612 ] Aravind edited comment on KAFKA-2078 at 3/30/15 11:44 PM: -- Hi Harsha, thanks for reply. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: broker.id=0 port=9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/data/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=30 log.cleaner.enable=false zookeeper.connect=host1:2181 zookeeper.connection.timeout.ms=6000 message.max.bytes=104857600 replica.fetch.max.bytes=104857600 Thanks, Aravind was (Author: aravind2015): Hi, thanks for replying. below are the details: Producer config: kafka.compression.codec=gzip producer.metadata.broker.list=host:9092 producer.logger=KafkaProducer producer.max.request.size=1073741824 producer.batch.num.messages=50 producer.queue.buffering.max.ms=5000 Server.properties: broker.id=0 port=9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/data/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=30 log.cleaner.enable=false zookeeper.connect=host1:2181 zookeeper.connection.timeout.ms=6000 message.max.bytes=104857600 replica.fetch.max.bytes=104857600 Thanks, Aravind Getting Selector [WARN] Error in I/O with host java.io.EOFException --- Key: KAFKA-2078 URL: https://issues.apache.org/jira/browse/KAFKA-2078 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2.0 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x Intel(R) Xeon(R) CPU X5660 @ 2.80GHz/44GB Reporter: Aravind Assignee: Jun Rao When trying to Produce 1000 (10 MB) messages, getting this below error some where between 997 to 1000th message. There is no pattern but able to reproduce. [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at java.lang.Thread.run(Thread.java:724) This error I am getting some times @ 997th message or 999th message. There is no pattern but able to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender
[ https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated KAFKA-2041: Attachment: kafka-2041-002.patch Add ability to specify a KeyClass for KafkaLog4jAppender Key: KAFKA-2041 URL: https://issues.apache.org/jira/browse/KAFKA-2041 Project: Kafka Issue Type: Improvement Components: producer Reporter: Benoy Antony Assignee: Jun Rao Attachments: kafka-2041-001.patch, kafka-2041-002.patch KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. Since there is no key or explicit partition number, the messages are sent to random partitions. In some cases, it is possible to derive a key from the message itself. So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which will provide a key for a given message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector
[ https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387543#comment-14387543 ] David Chu commented on KAFKA-1716: -- Yes, from what I can tell, if I just start up my consumer application which creates 8 instances of {{ZookeeperConsumerConnector}} with 1 {{ConsumerIterator}} being created from each {{ZookeeperConsumerConnector}} like the following: {code} this.msgIterator = consumerConnector.createMessageStreamsByFilter(topicWhiteList, /*numThreads*/1).get(0).iterator(); {code} and there are no topics existing in the Kafka brokers, then the consumer fetcher threads will not be created. Also, in my setup I do have the {{auto.create.topics.enable}} property set to {{true}} but it doesn't look like topics are created when I startup the consumers, however, if I publish messages to these topics they do get created. Also, to verify the existence of the topics I'm just looking to see if they have entries under the Zookeeper path {{kafka/brokers/topics}}. From what I can tell, it looks like the ConsumerFetcherThread is created from the {{LeaderFinderThread.createFetcherThread}} method which is called from the {{AbstractFetcherManager.addFetcherForPartitions}} method which is called from the {{LeaderFinderThread.doWork}} method. In my case it appears that the {{AbstractFetcherManager.addFetcherForPartitions}} method is never being called from the {{LeaderFinderThread.doWork}} method due to the following: # When the {{LeaderFinderThread.doWork}} method is called the {{ConsumerFetcherManager.noLeaderPartitionSet}} field is empty so the thread ends up calling {{cond.await()}} on line 61. # The thread then throws an exception (I can't see the actual exception but my guess is it's an {{InterruptedException}}) so it ends up in the {{catch}} block on line 84. # At this point the {{LeaderFinderThread.isRunning}} field is {{false}} so it ends up throwing the exception again on line 86. # Therefore, the {{addFetcherForPartition}} method on line 95 is never called and the ConsumerFetcherThread is never created. hang during shutdown of ZookeeperConsumerConnector -- Key: KAFKA-1716 URL: https://issues.apache.org/jira/browse/KAFKA-1716 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.1.1 Reporter: Sean Fay Assignee: Neha Narkhede Attachments: after-shutdown.log, before-shutdown.log, kafka-shutdown-stuck.log It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to wedge in the case that some consumer fetcher threads receive messages during the shutdown process. Shutdown thread: {code}-- Parking to wait for: java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0 at jrockit/vm/Locks.park0(J)V(Native Method) at jrockit/vm/Locks.park(Locks.java:2230) at sun/misc/Unsafe.park(ZJ)V(Native Method) at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156) at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207) at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36) at kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121) at kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120) at scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226) at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39) at scala/collection/mutable/HashMap.foreach(HashMap.scala:98) at scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120) ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock] at kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148) at kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171) at
[jira] [Created] (KAFKA-2079) Support exhibitor
Aaron Dixon created KAFKA-2079: -- Summary: Support exhibitor Key: KAFKA-2079 URL: https://issues.apache.org/jira/browse/KAFKA-2079 Project: Kafka Issue Type: Improvement Reporter: Aaron Dixon Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring solution for managing Zookeeper clusters. It supports use cases like discovery, node replacements and auto-scaling of Zk cluster hosts (so you don't have to manage a fixed set of Zk hosts--especially useful in cloud environments.) The easiest way for Kafka to support connection to Zk clusters via exhibitor is to use curator as its client. There is already a separate ticket for this: KAFKA-873 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 32650: Patch for KAFKA-2000
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32650/ --- Review request for kafka. Bugs: KAFKA-2000 https://issues.apache.org/jira/browse/KAFKA-2000 Repository: kafka Description --- KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted. Diffs - core/src/main/scala/kafka/server/OffsetManager.scala 395b1dbe43a5db47151e72a1b588d72f03cef963 Diff: https://reviews.apache.org/r/32650/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted
[ https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-2000: -- Attachment: KAFKA-2000.patch Delete consumer offsets from kafka once the topic is deleted Key: KAFKA-2000 URL: https://issues.apache.org/jira/browse/KAFKA-2000 Project: Kafka Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-2000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted
[ https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-2000: -- Reviewer: Joel Koshy Delete consumer offsets from kafka once the topic is deleted Key: KAFKA-2000 URL: https://issues.apache.org/jira/browse/KAFKA-2000 Project: Kafka Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-2000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted
[ https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-2000: -- Status: Patch Available (was: Open) Delete consumer offsets from kafka once the topic is deleted Key: KAFKA-2000 URL: https://issues.apache.org/jira/browse/KAFKA-2000 Project: Kafka Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-2000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted
[ https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387479#comment-14387479 ] Sriharsha Chintalapani commented on KAFKA-2000: --- Created reviewboard https://reviews.apache.org/r/32650/diff/ against branch origin/trunk Delete consumer offsets from kafka once the topic is deleted Key: KAFKA-2000 URL: https://issues.apache.org/jira/browse/KAFKA-2000 Project: Kafka Issue Type: Bug Reporter: Sriharsha Chintalapani Assignee: Sriharsha Chintalapani Labels: newbie++ Attachments: KAFKA-2000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Metrics package discussion
If we do plan to use the network code in client, I think that is a good reason in favor of migration. It will be unnecessary to have metrics from multiple libraries coexist since our users will have to start monitoring these new metrics anyway. I also agree with Jay that in multi-tenant clusters people care about detailed statistics for their own application over global numbers. Based on the arguments so far, I'm +1 for migrating to KM. Thanks, Aditya From: Jun Rao [j...@confluent.io] Sent: Sunday, March 29, 2015 9:44 AM To: dev@kafka.apache.org Subject: Re: Metrics package discussion There is another thing to consider. We plan to reuse the client components on the server side over time. For example, as part of the security work, we are looking into replacing the server side network code with the client network code (KAFKA-1928). However, the client network already has metrics based on KM. Thanks, Jun On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote: I think Joel's summary is good. I'll add a few more points: As discussed memory matter a lot if we want to be able to give percentiles at the client or topic level, in which case we will have thousands of them. If we just do histograms at the global level then it is not a concern. The argument for doing histograms at the client and topic level is that averages are often very misleading, especially for latency information or other asymmetric distributions. Most people who care about this kind of thing would say the same. If you are a user of a multi-tenant cluster then you probably care a lot more about stats for your application or your topic rather than the global, so it could be nice to have histograms for these. I don't feel super strongly about this. The ExponentiallyDecayingSample is internally a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of about 64 bytes per entry. So a 1000 element sample is 64KB. For global metrics this is fine, but for granular metrics not workable. Two other issues I'm not sure about: 1. Is there a way to get metric descriptions into the coda hale JMX output? One of the really nicest practical things about the new client metrics is that if you look at them in jconsole each metric has an associated description that explains what it means. I think this is a nice usability thing--it is really hard to know what to make of the current metrics without this kind of documentation and keeping separate docs up-to-date is really hard and even if you do it most people won't find it. 2. I'm not clear if the sample decay in the histogram is actually the same as for the other stats. It seems like it isn't but this would make interpretation quite difficult. In other words if I have N metrics including some Histograms some Meters, etc are all these measurements all taken over the same time window? I actually think they are not, it looks like there are different sampling methodologies across. So this means if you have a dashboard that plots these things side by side the measurement at a given point in time is not actually comparable across multiple stats. Am I confused about this? -Jay On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote: For the samples: it will be at least double that estimate I think since the long array contains (eight byte) references to the actual longs, each of which also have some object overhead. Re: testing: actually, it looks like YM metrics does allow you to drop in your own clock: https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36 Not sure if it was mentioned in this (or some recent) thread but a major motivation in the kafka-common metrics (KM) was absorbing API changes and even mbean naming conventions. For e.g., in the early stages of 0.8 we picked up YM metrics 3.x but collided with client apps at LinkedIn which were still on 2.x. We ended up changing our code to use 2.x in the end. Having our own metrics package makes us less vulnerable to these kinds of changes. The multiple version collision problem is obviously less of an issue with the broker but we are still exposed to possible metric changes in YM metrics. I'm wondering if we need to weigh too much toward the memory overheads of histograms in making a decision here simply because I don't think we have found them to be an extreme necessity for per-clientid/per-partition metrics and they are more critical for aggregate (global) metrics. So it seems the main benefits of switching to KM metrics are: - Less exposure to YM metrics changes - More control over the actual implementation. E.g., there is considerable research on implementing approximate-but-good-enough
[jira] [Commented] (KAFKA-2079) Support exhibitor
[ https://issues.apache.org/jira/browse/KAFKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387592#comment-14387592 ] Aaron Dixon commented on KAFKA-2079: A proposed solution: https://github.com/apache/kafka/pull/53 Support exhibitor - Key: KAFKA-2079 URL: https://issues.apache.org/jira/browse/KAFKA-2079 Project: Kafka Issue Type: Improvement Reporter: Aaron Dixon Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring solution for managing Zookeeper clusters. It supports use cases like discovery, node replacements and auto-scaling of Zk cluster hosts (so you don't have to manage a fixed set of Zk hosts--especially useful in cloud environments.) The easiest way for Kafka to support connection to Zk clusters via exhibitor is to use curator as its client. There is already a separate ticket for this: KAFKA-873 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender
[ https://issues.apache.org/jira/browse/KAFKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387690#comment-14387690 ] Sriharsha Chintalapani commented on KAFKA-2077: --- [~benoyantony] It will be helpful if you can send the patch using kafka-patch-review tool here are the instructions https://cwiki.apache.org/confluence/display/KAFKA/Kafka+patch+review+tool Add ability to specify a TopicPicker class for KafkaLog4jApender Key: KAFKA-2077 URL: https://issues.apache.org/jira/browse/KAFKA-2077 Project: Kafka Issue Type: Improvement Components: producer Reporter: Benoy Antony Assignee: Jun Rao Attachments: kafka-2077-001.patch KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. Currently , a topic name has to be passed as a parameter. In some use cases, it may be required to use a different topics for the same appender instance. So it may be beneficial to enable KafkaLog4jAppender to accept TopicClass which will provide a topic for a given message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32650: Patch for KAFKA-2000
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32650/#review78310 --- Ship it! Very nice fix :) - Gwen Shapira On March 30, 2015, 9:47 p.m., Sriharsha Chintalapani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32650/ --- (Updated March 30, 2015, 9:47 p.m.) Review request for kafka. Bugs: KAFKA-2000 https://issues.apache.org/jira/browse/KAFKA-2000 Repository: kafka Description --- KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted. Diffs - core/src/main/scala/kafka/server/OffsetManager.scala 395b1dbe43a5db47151e72a1b588d72f03cef963 Diff: https://reviews.apache.org/r/32650/diff/ Testing --- Thanks, Sriharsha Chintalapani
Re: [DISCUSSION] Keep docs updated per jira
Also for the wikis - those should probably correspond to the latest released version right? So for e.g., if we add or modify the protocol on trunk we can add it to the wiki but mark it with some highlight or similar just to make it clear that it is a change on trunk only. Thanks, Joel On Thu, Mar 26, 2015 at 06:27:25PM -0700, Jun Rao wrote: Hi, Everyone, Quite a few jiras these days require documentation changes (e.g., wire protocol, ZK layout, configs, jmx, etc). Historically, we have been updating the documentation just before we do a release. The issue is that some of the changes will be missed since they were done a while back. Another way to do that is to keep the docs updated as we complete each jira. Currently, our documentations are in the following places. wire protocol: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol ZK layout: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083 We probably don't need to update configs already ported to ConfigDef since they can be generated automatically. However, for the rest of the doc related changes, keeping they updated per jira seems a better approach. What do people think? Thanks, Jun
Re: Metrics package discussion
If we are committed to migrating the broker side metrics to KM for the next release, we will need to (1) have a story on supporting common reporters (as listed in KAFKA-1930), and (2) see if the current histogram support is good enough for measuring things like request time. Thanks, Jun On Mon, Mar 30, 2015 at 3:03 PM, Aditya Auradkar aaurad...@linkedin.com.invalid wrote: If we do plan to use the network code in client, I think that is a good reason in favor of migration. It will be unnecessary to have metrics from multiple libraries coexist since our users will have to start monitoring these new metrics anyway. I also agree with Jay that in multi-tenant clusters people care about detailed statistics for their own application over global numbers. Based on the arguments so far, I'm +1 for migrating to KM. Thanks, Aditya From: Jun Rao [j...@confluent.io] Sent: Sunday, March 29, 2015 9:44 AM To: dev@kafka.apache.org Subject: Re: Metrics package discussion There is another thing to consider. We plan to reuse the client components on the server side over time. For example, as part of the security work, we are looking into replacing the server side network code with the client network code (KAFKA-1928). However, the client network already has metrics based on KM. Thanks, Jun On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote: I think Joel's summary is good. I'll add a few more points: As discussed memory matter a lot if we want to be able to give percentiles at the client or topic level, in which case we will have thousands of them. If we just do histograms at the global level then it is not a concern. The argument for doing histograms at the client and topic level is that averages are often very misleading, especially for latency information or other asymmetric distributions. Most people who care about this kind of thing would say the same. If you are a user of a multi-tenant cluster then you probably care a lot more about stats for your application or your topic rather than the global, so it could be nice to have histograms for these. I don't feel super strongly about this. The ExponentiallyDecayingSample is internally a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of about 64 bytes per entry. So a 1000 element sample is 64KB. For global metrics this is fine, but for granular metrics not workable. Two other issues I'm not sure about: 1. Is there a way to get metric descriptions into the coda hale JMX output? One of the really nicest practical things about the new client metrics is that if you look at them in jconsole each metric has an associated description that explains what it means. I think this is a nice usability thing--it is really hard to know what to make of the current metrics without this kind of documentation and keeping separate docs up-to-date is really hard and even if you do it most people won't find it. 2. I'm not clear if the sample decay in the histogram is actually the same as for the other stats. It seems like it isn't but this would make interpretation quite difficult. In other words if I have N metrics including some Histograms some Meters, etc are all these measurements all taken over the same time window? I actually think they are not, it looks like there are different sampling methodologies across. So this means if you have a dashboard that plots these things side by side the measurement at a given point in time is not actually comparable across multiple stats. Am I confused about this? -Jay On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote: For the samples: it will be at least double that estimate I think since the long array contains (eight byte) references to the actual longs, each of which also have some object overhead. Re: testing: actually, it looks like YM metrics does allow you to drop in your own clock: https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36 Not sure if it was mentioned in this (or some recent) thread but a major motivation in the kafka-common metrics (KM) was absorbing API changes and even mbean naming conventions. For e.g., in the early stages of 0.8 we picked up YM metrics 3.x but collided with client apps at LinkedIn which were still on 2.x. We ended up changing our code to use 2.x in the end. Having our own metrics package makes us less vulnerable to these kinds of changes. The multiple version collision problem is obviously less of an issue with the broker but we are still exposed to possible metric changes in YM metrics. I'm wondering if we need to weigh too much toward the memory
[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows
[ https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387828#comment-14387828 ] Sriharsha Chintalapani commented on KAFKA-1646: --- [~jkreps] Any guidance on the latest patch? Improve consumer read performance for Windows - Key: KAFKA-1646 URL: https://issues.apache.org/jira/browse/KAFKA-1646 Project: Kafka Issue Type: Improvement Components: log Affects Versions: 0.8.1.1 Environment: Windows Reporter: xueqiang wang Assignee: xueqiang wang Labels: newbie, patch Attachments: Improve consumer read performance for Windows.patch, KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, KAFKA-1646_20150312_200352.patch This patch is for Window platform only. In Windows platform, if there are more than one replicas writing to disk, the segment log files will not be consistent in disk and then consumer reading performance will be dropped down greatly. This fix allocates more disk spaces when rolling a new segment, and then it will improve the consumer reading performance in NTFS file system. This patch doesn't affect file allocation of other filesystems, for it only adds statements like 'if(Os.iswindow)' or adds methods used on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Koshy reopened KAFKA-1910: --- Reopening for the doc update and clarification on the change in error code. Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.3 Attachments: KAFKA-1910.patch, KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer
[ https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387652#comment-14387652 ] Joel Koshy commented on KAFKA-1910: --- [~guozhang] this patch modified some of the behavior of OffsetFetchRequest which [~toddpalino] caught while using a python client against the latest broker (on trunk) Previously, we would return NoError with an invalid offset if there was no committed offset. Now it returns the NoOffsetsCommittedCode. Was there a discussion on this? Either way, we should also update the protocol documentation if this change sticks. Refactor KafkaConsumer -- Key: KAFKA-1910 URL: https://issues.apache.org/jira/browse/KAFKA-1910 Project: Kafka Issue Type: Sub-task Components: consumer Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.3 Attachments: KAFKA-1910.patch, KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch KafkaConsumer now contains all the logic on the consumer side, making it a very huge class file, better re-factoring it to have multiple layers on top of KafkaClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Metrics package discussion
(1) It will be interesting to see what others use for monitoring integration, to see what is already covered with existing JMX integrations and what needs special support. (2) I think the migration story is more important - this is a non-compatible change, right? So we can't do it in 0.8.3 timeframe, it has to be in 0.9? And we need to figure out how will users migrate - do we just tell everyone please reconfigure all your monitors from scratch - don't worry, it is worth it? I know you keep saying we did it before and our users are used to it, but I think there are a lot more users now, and some of them have different compatibility expectations. We probably need to find: * A least painful way to migrate - can we keep the names of at least most of the metrics intact? * Good explanation of what users gain from this painful migration (i.e. more accurate statistics due to gazillion histograms) On Mon, Mar 30, 2015 at 6:29 PM, Jun Rao j...@confluent.io wrote: If we are committed to migrating the broker side metrics to KM for the next release, we will need to (1) have a story on supporting common reporters (as listed in KAFKA-1930), and (2) see if the current histogram support is good enough for measuring things like request time. Thanks, Jun On Mon, Mar 30, 2015 at 3:03 PM, Aditya Auradkar aaurad...@linkedin.com.invalid wrote: If we do plan to use the network code in client, I think that is a good reason in favor of migration. It will be unnecessary to have metrics from multiple libraries coexist since our users will have to start monitoring these new metrics anyway. I also agree with Jay that in multi-tenant clusters people care about detailed statistics for their own application over global numbers. Based on the arguments so far, I'm +1 for migrating to KM. Thanks, Aditya From: Jun Rao [j...@confluent.io] Sent: Sunday, March 29, 2015 9:44 AM To: dev@kafka.apache.org Subject: Re: Metrics package discussion There is another thing to consider. We plan to reuse the client components on the server side over time. For example, as part of the security work, we are looking into replacing the server side network code with the client network code (KAFKA-1928). However, the client network already has metrics based on KM. Thanks, Jun On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote: I think Joel's summary is good. I'll add a few more points: As discussed memory matter a lot if we want to be able to give percentiles at the client or topic level, in which case we will have thousands of them. If we just do histograms at the global level then it is not a concern. The argument for doing histograms at the client and topic level is that averages are often very misleading, especially for latency information or other asymmetric distributions. Most people who care about this kind of thing would say the same. If you are a user of a multi-tenant cluster then you probably care a lot more about stats for your application or your topic rather than the global, so it could be nice to have histograms for these. I don't feel super strongly about this. The ExponentiallyDecayingSample is internally a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of about 64 bytes per entry. So a 1000 element sample is 64KB. For global metrics this is fine, but for granular metrics not workable. Two other issues I'm not sure about: 1. Is there a way to get metric descriptions into the coda hale JMX output? One of the really nicest practical things about the new client metrics is that if you look at them in jconsole each metric has an associated description that explains what it means. I think this is a nice usability thing--it is really hard to know what to make of the current metrics without this kind of documentation and keeping separate docs up-to-date is really hard and even if you do it most people won't find it. 2. I'm not clear if the sample decay in the histogram is actually the same as for the other stats. It seems like it isn't but this would make interpretation quite difficult. In other words if I have N metrics including some Histograms some Meters, etc are all these measurements all taken over the same time window? I actually think they are not, it looks like there are different sampling methodologies across. So this means if you have a dashboard that plots these things side by side the measurement at a given point in time is not actually comparable across multiple stats. Am I confused about this? -Jay On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote: For the samples: it will be at least double that estimate I think since the long array contains (eight byte) references to the actual longs, each of which also have some object overhead. Re: testing: actually, it looks like YM metrics
Re: Review Request 32650: Patch for KAFKA-2000
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32650/#review78337 --- This might be out of scope for this JIRA, but I think if the deleted topic gets recreated before compaction, the offsets corresponding to the older version of the topic won't be deleted. This usually doesn't matter because auto.offset.reset will be triggered if the new version of the topic is smaller than the old version in terms of offsets. As with delete topic from zookeeper-based offsets, there's the edge case of the consumer skipping messages from the new version of the topic if old version's offsets still fit. This edge case was briefly discussed here: https://issues.apache.org/jira/browse/KAFKA-1787 - Onur Karaman On March 30, 2015, 9:47 p.m., Sriharsha Chintalapani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32650/ --- (Updated March 30, 2015, 9:47 p.m.) Review request for kafka. Bugs: KAFKA-2000 https://issues.apache.org/jira/browse/KAFKA-2000 Repository: kafka Description --- KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted. Diffs - core/src/main/scala/kafka/server/OffsetManager.scala 395b1dbe43a5db47151e72a1b588d72f03cef963 Diff: https://reviews.apache.org/r/32650/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Commented] (KAFKA-2059) ZookeeperConsumerConnectorTest.testBasic trasient failure
[ https://issues.apache.org/jira/browse/KAFKA-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388046#comment-14388046 ] Fangmin Lv commented on KAFKA-2059: --- Hi Guozhang, Is this failure repeatable? How often, occasionally or always failed on your environment? I've run this testcase for an hour and haven't seen any failure (run based on SHA c5df2a8) : $ while ./gradlew --rerun-tasks -Dtest.single=ZookeeperConsumerConnectorTest core:test; do echo .; done Thanks, Fangmin ZookeeperConsumerConnectorTest.testBasic trasient failure - Key: KAFKA-2059 URL: https://issues.apache.org/jira/browse/KAFKA-2059 Project: Kafka Issue Type: Sub-task Reporter: Guozhang Wang Labels: newbie {code} kafka.javaapi.consumer.ZookeeperConsumerConnectorTest testBasic FAILED kafka.common.InconsistentBrokerIdException: Configured brokerId 1 doesn't match stored brokerId 0 in meta.properties at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:443) at kafka.server.KafkaServer.startup(KafkaServer.scala:116) at kafka.utils.TestUtils$.createServer(TestUtils.scala:126) at kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:57) at kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:57) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:57) at kafka.javaapi.consumer.ZookeeperConsumerConnectorTest.setUp(ZookeeperConsumerConnectorTest.scala:41) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)