[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Ahmet AKYOL (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327
 ] 

Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:43 AM:
-

Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though.


was (Author: liqusha):
~
Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism but 

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Ahmet AKYOL (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327
 ] 

Ahmet AKYOL commented on KAFKA-1932:


~
Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism but 

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Ahmet AKYOL (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327
 ] 

Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:48 AM:
-

Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though, but many. Since [~jkreps] is 
also at Confluent, I guess this type administrative features will mostly be 
done by Confluent which I'am ok with.

So, this issue can be closed.


was (Author: liqusha):
Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though.

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Ahmet AKYOL (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327
 ] 

Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 8:41 AM:
-

Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/schema-registry/docs/api.html
http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though, but it's the administrative 
client Kafka missing with other features. And I guess, since [~jkreps] is also 
at Confluent, this type administrative features will mostly be done by 
Confluent which I'am ok with it.

So, this issue can be closed.


was (Author: liqusha):
Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though, but it's the administrative 
client Kafka missing with other features. And I guess, since [~jkreps] is also 
at Confluent, this type administrative features will mostly be done by 
Confluent which I'am ok with it.

So, this issue can be closed.

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Ahmet AKYOL (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386327#comment-14386327
 ] 

Ahmet AKYOL edited comment on KAFKA-1932 at 3/30/15 7:52 AM:
-

Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though, but it's the administrative 
client Kafka missing with other features. And I guess, since [~jkreps] is also 
at Confluent, this type administrative features will mostly be done by 
Confluent which I'am ok with it.

So, this issue can be closed.


was (Author: liqusha):
Third party solution, rest proxy from Confluent :

http://confluent.io/docs/current/kafka-rest/docs/intro.html

It doesn't provide a template mechanism though, but many. Since [~jkreps] is 
also at Confluent, I guess this type administrative features will mostly be 
done by Confluent which I'am ok with.

So, this issue can be closed.

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: zookeeper usage in kafka offset commit / fetch requests

2015-03-30 Thread svante karlsson
Just to close this,

I found out that the broker handles the request different depending on
whether you specify api version 0 or 1.

If using V1 the broker commits the offsets to internal topic.

/svante


Re: [VOTE] KIP-7 Security - IP Filtering

2015-03-30 Thread Brock Noland
I think it is acceptable for KIP-11 to allow additional network based
authorization. However I do feel that most user will want something
simpler and thus I don't feel KIP-11 should require network based
authorization by default.

For example postgres allows something similar to what is being
discussed here. In addition to user based authz of database objects,
administrators can use the pg_hba.conf configuration file to perform
network based authz for users and groups.

Despite this, few postgres installs with more than one user, use the
pg_hba.conf configuration for user based network authz. They do,
however, use standard user based authz for database objects and
pg_hba.conf for basic database-wide network authz.

tl;dr - I think we should implement both KIP-7 and KIP-11.

On Fri, Mar 20, 2015 at 7:23 PM, Gwen Shapira gshap...@cloudera.com wrote:
 I'd like to add that HDFS has had ACLs + RBAC + global IP white/black list
 for years now.

 We did not notice any customers confusing the features. I've seen customers
 use each feature for different purposes.

 Actually, the only system I am aware of  that integrated IP access controls
 together with RBAC and ACLs is MySQL, where it leads to major confusion
 among new admins (I can log in as gwenshap from a remote machine but not
 from localhost or vice-versa...).

 Gwen

 On Fri, Mar 20, 2015 at 4:04 PM, Jeff Holoman jholo...@cloudera.com wrote:

 Parth,

 I think it's important to understand the timing of both the initial JIRA
 and the KIP, it helps put my comments in proper context.

 The initial JIRA for this was created back in December, so the timeline for
 1688/KIP-11 was pretty unclear. KIP-7 came out when we started doing KIPs,
 back in Jan.  The initial comments I think pretty clearly acknowledged the
 work on 1688.

 My comment re: integration was that perhaps the check logic could be reused
 (i.e, the CIDR range checks). That's what I meant when I mentioned the
 intent. At that point there was no KIP-11 and no patch, so it was just a
 hunch.

 In terms of it being a different use case, I do think there are some
 aspects which would be beneficial, even given the work on 1688.

 1) Simplicity
 2) Less config - 2 params
 3) Allows for both whitelisting and blacklisting, giving a bit more
 flexibility.

 Another key driver, at least at the time, was timing. As this discussion
 has been extended that becomes less of a motivator, which I understand.

 I don't necessarily think that giving users options to limit access will
 confuse them, given the new interface for security will likely take a bit
 of understanding to implement correctly. In fact they might be quite
 complimentary.

 Lets take the 2nd example in KIP 11:

 user2.hosts=*
 user2.operation=read

 So if I understand correctly, this would allow read access for a particular
 topic from any host for a given user. it could be helpful to block a range
 of IPs (like production or QA) but otherwise not specify every potential
 host that needs to read from that topic.

 As you mentioned adding additional constructs make the ACL's a bit more
 complex, maybe this provides some relief there.

 Jun, if folks feel like there's too much overlap, and this wouldn't be
 useful, then that's fair. That's what the votes are for right? :)


 Thanks

 Jeff


 On Fri, Mar 20, 2015 at 6:01 PM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.com wrote:

  I am guessing in your last reply you meant KIP-11. And yes, I think
 KIP-11
  subsumed KIP-7 so if we can finish KIP-11 we should not need KIP=7 but I
  will let Jeff confirm that,
 
  Thanks
  Parth
 
 
  On 3/20/15, 2:32 PM, Jun Rao j...@confluent.io wrote:
 
  Right, if this KIP is subsumed by KIP-7, perhaps we just need to wait
  until
  KIP-7 is done? If we add the small change now, we will have to worry
 about
  migrating existing users and deprecating some configs when KIP-7 is
 done.
  
  Thanks,
  
  Jun
  
  On Fri, Mar 20, 2015 at 10:36 AM, Parth Brahmbhatt 
  pbrahmbh...@hortonworks.com wrote:
  
   I am not entirely sure what you mean by integrating KIP-7 work with
   KAFKA-1688. Wouldn¹t the work done as part of KIP-7 become obsolete
 once
   KAFKA-1688 is done? Multiple ways of controlling these authorization
  just
   seems extra configuration that will confuse admins/users.
  
   If timing is the only issue don¹t you think its better to focus our
  energy
   on getting 1688 done faster which seem to be the longer term goal
  anyways?
  
   Thanks
   Parth
  
   On 3/20/15, 10:28 AM, Jeff Holoman jholo...@cloudera.com wrote:
  
   Hey Jun,
   
   The intent was for the same functionality to be utilized when 1688 is
   done,
   as mentioned in the KIP:
   
   The broader security initiative http://kafka-1682/ will add more
   robust
   controls for these types of environments, and this proposal could be
   integrated with that work at the appropriate time. This is also the
   specific request of a large financial services company.
   
   I don't think 

[jira] [Commented] (KAFKA-2046) Delete topic still doesn't work

2015-03-30 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386942#comment-14386942
 ] 

Jay Kreps commented on KAFKA-2046:
--

Hmm, so we have a configuration which, if you set it to anything other than the 
default, causes a deadlock? Is there a reason we left this configurable once we 
realized infinite was the only safe value? Could we remove it now? :-)



 Delete topic still doesn't work
 ---

 Key: KAFKA-2046
 URL: https://issues.apache.org/jira/browse/KAFKA-2046
 Project: Kafka
  Issue Type: Bug
Reporter: Clark Haskins
Assignee: Onur Karaman

 I just attempted to delete at 128 partition topic with all inbound producers 
 stopped.
 The result was as follows:
 The /admin/delete_topics znode was empty
 the topic under /brokers/topics was removed
 The Kafka topics command showed that the topic was removed
 However, the data on disk on each broker was not deleted and the topic has 
 not yet been re-created by starting up the inbound mirror maker.
 Let me know what additional information is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1932) kafka topic (creation) templates

2015-03-30 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin resolved KAFKA-1932.
-
Resolution: Duplicate

[~liqusha] The requirement looks also covered in KIP-4. Close for now. Please 
reopen if I miss something.

 kafka topic (creation) templates
 

 Key: KAFKA-1932
 URL: https://issues.apache.org/jira/browse/KAFKA-1932
 Project: Kafka
  Issue Type: Wish
Reporter: Ahmet AKYOL

 AFAIK, the only way to create a Kafka topic (without using the default 
 settings) is using the provided bash script.
 Even though, a client support could be nice, I would prefer to see a template 
 mechanism similar to [Elasticsearch Index 
 Templates|http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html]
  . 
 What I have in my mind is very simple and adding something like this into 
 server properties :
 template.name=pattern,numOfReplica,NumberOfPartition
 and pattern can only contain * meaning starts with, ends with or contains.
 example:
 template.logtopics=*_log,2,20
 template.loaders=*_loader,1,5
 so,when some producer sends a message to a topic for the first time which 
 ends with _logs , then, kafka can use above settings.
 thanks in advance
 update:
 On second thought, maybe a command like kafka-create-template.sh could be 
 more practical for cluster deployments, rather than adding to 
 server.properties. Kafka internally registers this to ZK.
 About use cases, I can understand an opposing argument like creating many 
 topics is not a good design decision. Besides, my point is not to create so 
 many topics, just to automate an important process by giving the 
 responsibility to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2076) Add an API to new consumer to allow user get high watermark of partitions.

2015-03-30 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-2076:
---

 Summary: Add an API to new consumer to allow user get high 
watermark of partitions.
 Key: KAFKA-2076
 URL: https://issues.apache.org/jira/browse/KAFKA-2076
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin


We have a use case that user wants to know how far it is behind a particular 
partition on startup. Currently in each fetch response, we have high watermark 
for each partition, we only keep a global max-lag metric. It would be better 
that we keep a record of high watermark per partition and update it on each 
fetch response. We can add a new API to let user query the high watermark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Can I be added as a contributor?

2015-03-30 Thread Brock Noland
Hi,

Could I be added as a contributor and to confluence? I am brocknoland
on JIRA and brockn at gmail on confluence.

Cheers!
Brock


Re: [DISCUSSION] Keep docs updated per jira

2015-03-30 Thread Jay Kreps
Yeah that sounds good to me. That would involve someone converting us over
to the templating system we chose (something like Jenkyll would be easy
since it allows inline HTML, so we'd just have to replace the apache
includes). It's also quite possible that Apache has solved the whole
SVN/website dependency. I get the feeling git is no longer as taboo as it
once was in Apache...

-Jay

On Sat, Mar 28, 2015 at 9:46 PM, Gwen Shapira gshap...@cloudera.com wrote:

 On Thu, Mar 26, 2015 at 9:46 PM, Jay Kreps jay.kr...@gmail.com wrote:
  The reason the docs are in svn is that when we were setting up the site
  apache required that to publish doc changes. Two possible fixes:
  1. Follow up with infra to see if they have git integration working yet
  2. Move to a model where doc source is kept in the main git and we use
  jenkyl or something like that to generate result html (i.e. with things
  like headers) and then check that in to svn to publish it.
 
  The second item would have the advantage of not needing to configure
 apache
  includes to see the docs, but would likely trade it for jenkyl setup
 stuff.
  Jenkyl might actually fix a lot of the repetitive stuff in the docs today
  (e.g. generating section numbers, adding p tags, etc).

 What we do at other projects that I'm familiar with (Sqoop, Flume) is
 that we manage the docs source (i.e. asciidoc or similar) in git.
 When its time to release a version (i.e. after a successful vote on
 RC), we build the docs, manually copy to the SVN repo and push (or
 whatever the SVN equivalent...). Its a bit of a manual pain, but its
 only few times a year. This is also a good opportunity to upload
 updated javadocs, docs generated from ConfigDef and KafkaMetrics, etc.

 Gwen


 
  -Jay
 
  On Thu, Mar 26, 2015 at 8:22 PM, Jiangjie Qin j...@linkedin.com.invalid
 
  wrote:
 
 
 
  On 3/26/15, 7:00 PM, Neha Narkhede n...@confluent.io wrote:
 
  
   Much much easier to do this if the docs are in git and can be
 reviewed
  and
   committed / reverted with the code (transactions makes
 synchronization
   easier...).
  +1 on this, too!
  
  
  Huge +1.
  
  On Thu, Mar 26, 2015 at 6:54 PM, Joel Koshy jjkosh...@gmail.com
 wrote:
  
   +1
  
   It is indeed too easy to forget and realize only much later that a
   jira needed a doc update. So getting into the habit of asking did
 you
   update the docs as part of review will definitely help.
  
   On Thu, Mar 26, 2015 at 06:36:43PM -0700, Gwen Shapira wrote:
I strongly support the goal of keeping docs and code in sync.
   
Much much easier to do this if the docs are in git and can be
 reviewed
   and
committed / reverted with the code (transactions makes
 synchronization
easier...).
   
This will also allow us to:
1. Include the docs in the bits we release
2. On release, update the website with the docs from the specific
  branch
that was just released
3. Hook our build to ReadTheDocs and update the trunk docs with
  every
commit
   
   
Tons of Apache projects do this already and having reviews enforce
 the
   did
you update the docs before committing is the best way to guarantee
   updated
docs.
   
Gwen
   
On Thu, Mar 26, 2015 at 6:27 PM, Jun Rao j...@confluent.io wrote:
   
 Hi, Everyone,

 Quite a few jiras these days require documentation changes (e.g.,
  wire
 protocol, ZK layout, configs, jmx, etc). Historically, we have
 been
 updating the documentation just before we do a release. The
 issue is
   that
 some of the changes will be missed since they were done a while
  back.
 Another way to do that is to keep the docs updated as we complete
  each
 jira. Currently, our documentations are in the following places.

 wire protocol:


  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pr
  otocol
 ZK layout:


  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+i
  n+Zookeeper
 configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083

 We probably don't need to update configs already ported to
 ConfigDef
   since
 they can be generated automatically. However, for the rest of the
  doc
 related changes, keeping they updated per jira seems a better
  approach.
 What do people think?

 Thanks,

 Jun

  
  
  
  
  --
  Thanks,
  Neha
 
 



Kafka confusion

2015-03-30 Thread Jay Kreps
Gwen had a really good blog post on confusing things about Kafka:
http://ingest.tips/2015/03/26/what-is-confusing-about-kafka/

A lot of these are in-flight now (e.g. consumer) or finished (e.g. delete
topic, kind of, and non-sticky producer partitioning). Do we have bugs for
the rest?

It would be great to make this an ongoing thing we do to assess confusing
configs, etc. Not sure the best way, to do this, but maybe just maintaining
a special JIRA tag and periodically polling people again?

-Jay


Re: Kafka confusion

2015-03-30 Thread Gwen Shapira
I was planning on doing a re-poll about a month after every release :)

Maybe it can be part of the release activity.

Gwen

On Mon, Mar 30, 2015 at 10:26 AM, Jay Kreps j...@confluent.io wrote:
 Gwen had a really good blog post on confusing things about Kafka:
 http://ingest.tips/2015/03/26/what-is-confusing-about-kafka/

 A lot of these are in-flight now (e.g. consumer) or finished (e.g. delete
 topic, kind of, and non-sticky producer partitioning). Do we have bugs for
 the rest?

 It would be great to make this an ongoing thing we do to assess confusing
 configs, etc. Not sure the best way, to do this, but maybe just maintaining
 a special JIRA tag and periodically polling people again?

 -Jay


[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-03-30 Thread Aaron Dixon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387303#comment-14387303
 ] 

Aaron Dixon commented on KAFKA-873:
---

I've recently worked zkclient bridge to get a new release out so that I could 
integrate curator with Kafka. I have a pull request coming soon to the Kafka 
project with this integration.

 Consider replacing zkclient with curator (with zkclient-bridge)
 ---

 Key: KAFKA-873
 URL: https://issues.apache.org/jira/browse/KAFKA-873
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Scott Clasen
Assignee: Grant Henke

 If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
 be initially a drop-in replacement
 https://github.com/Netflix/curator/wiki/ZKClient-Bridge
 With the addition of a few more props to ZkConfig, and a bit of code this 
 would open up the possibility of using ACLs in zookeeper (which arent 
 supported directly by zkclient), as well as integrating with netflix 
 exhibitor for those of us using that.
 Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Issue Comment Deleted] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-30 Thread Mayuresh Gharat
You can take a look at this :

https://issues.apache.org/jira/browse/KAFKA-1848

Thanks,

Mayuresh

On Sun, Mar 29, 2015 at 4:45 PM, Jiangjie Qin (JIRA) j...@apache.org
wrote:


  [
 https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

 Jiangjie Qin updated KAFKA-1716:
 
 Comment: was deleted

 (was: [~dchu] Do you mean that the fetchers have never been created?
 That's a good point, but I still do not totally understand the cause.
 The first rebalance of ZookeeperConsumeConnector occurs when KafkaStreams
 are created. That means you need to specify a topic count map and create
 streams. So leader finder thread will send TopicMetadataRequest to brokers
 to get back the topic metadata for the topic. By default auto topic
 creation is enabled on Kafka brokers. That means when broker saw a
 TopicMetadataRequest asking for a topic that does not exist yet, it will
 created it and return the topic metadata. So the consumer fetcher thread
 will be created for the topic on ZookeeperConsumerConnector. However, if
 auto topic creation is turned off, your description looks possible.
 About the shutdown issue. You are right, that is an issue that has been
 fixed in KAFKA-1848, but seems not included in 0.8.2. I just changed the
 fix version from 0.9.0 to 0.8.3 instead.)

  hang during shutdown of ZookeeperConsumerConnector
  --
 
  Key: KAFKA-1716
  URL: https://issues.apache.org/jira/browse/KAFKA-1716
  Project: Kafka
   Issue Type: Bug
   Components: consumer
 Affects Versions: 0.8.1.1
 Reporter: Sean Fay
 Assignee: Neha Narkhede
  Attachments: after-shutdown.log, before-shutdown.log,
 kafka-shutdown-stuck.log
 
 
  It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}}
 to wedge in the case that some consumer fetcher threads receive messages
 during the shutdown process.
  Shutdown thread:
  {code}-- Parking to wait for:
 java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
  at jrockit/vm/Locks.park0(J)V(Native Method)
  at jrockit/vm/Locks.park(Locks.java:2230)
  at sun/misc/Unsafe.park(ZJ)V(Native Method)
  at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
  at
 java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
  at
 java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
  at
 java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
  at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
  at
 kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
  at
 kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
  at
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
  at
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
  at
 scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
  at
 scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
  at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
  at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
  at
 scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  at
 kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
  ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
  at
 kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
  at
 kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
  at
 kafka/consumer/ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:167){code}
  ConsumerFetcherThread:
  {code}-- Parking to wait for:
 java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x2aaaebcc7568
  at jrockit/vm/Locks.park0(J)V(Native Method)
  at jrockit/vm/Locks.park(Locks.java:2230)
  at sun/misc/Unsafe.park(ZJ)V(Native Method)
  at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
  at
 java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
  at
 java/util/concurrent/LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
  at
 kafka/consumer/PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
  at
 

[jira] [Commented] (KAFKA-1809) Refactor brokers to allow listening on multiple ports and IPs

2015-03-30 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387348#comment-14387348
 ] 

Joel Koshy commented on KAFKA-1809:
---

Thanks for the ping - yes I would like to take a look but will only be able to 
get to it by mid-week. If it is checked-in, I can review after.

 Refactor brokers to allow listening on multiple ports and IPs 
 --

 Key: KAFKA-1809
 URL: https://issues.apache.org/jira/browse/KAFKA-1809
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Gwen Shapira
Assignee: Gwen Shapira
 Attachments: KAFKA-1809.patch, KAFKA-1809.v1.patch, 
 KAFKA-1809.v2.patch, KAFKA-1809.v3.patch, 
 KAFKA-1809_2014-12-25_11:04:17.patch, KAFKA-1809_2014-12-27_12:03:35.patch, 
 KAFKA-1809_2014-12-27_12:22:56.patch, KAFKA-1809_2014-12-29_12:11:36.patch, 
 KAFKA-1809_2014-12-30_11:25:29.patch, KAFKA-1809_2014-12-30_18:48:46.patch, 
 KAFKA-1809_2015-01-05_14:23:57.patch, KAFKA-1809_2015-01-05_20:25:15.patch, 
 KAFKA-1809_2015-01-05_21:40:14.patch, KAFKA-1809_2015-01-06_11:46:22.patch, 
 KAFKA-1809_2015-01-13_18:16:21.patch, KAFKA-1809_2015-01-28_10:26:22.patch, 
 KAFKA-1809_2015-02-02_11:55:16.patch, KAFKA-1809_2015-02-03_10:45:31.patch, 
 KAFKA-1809_2015-02-03_10:46:42.patch, KAFKA-1809_2015-02-03_10:52:36.patch, 
 KAFKA-1809_2015-03-16_09:02:18.patch, KAFKA-1809_2015-03-16_09:40:49.patch, 
 KAFKA-1809_2015-03-27_15:04:10.patch


 The goal is to eventually support different security mechanisms on different 
 ports. 
 Currently brokers are defined as host+port pair, and this definition exists 
 throughout the code-base, therefore some refactoring is needed to support 
 multiple ports for a single broker.
 The detailed design is here: 
 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender

2015-03-30 Thread Benoy Antony (JIRA)
Benoy Antony created KAFKA-2077:
---

 Summary: Add ability to specify a TopicPicker class for 
KafkaLog4jApender
 Key: KAFKA-2077
 URL: https://issues.apache.org/jira/browse/KAFKA-2077
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao


KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 

Currently , a topic name has to be passed as a parameter. In some use cases, it 
may be required to use a different topics for the same appender instance. 
So 
it may be beneficial to enable KafkaLog4jAppender to accept TopicClass which 
will provide a topic for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Aravind (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612
 ] 

Aravind edited comment on KAFKA-2078 at 3/30/15 11:41 PM:
--

Hi, thanks for replying. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

# Server Basics #

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# Socket Server Settings 
#

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all 
interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it 
uses the
# value for host.name if configured.  Otherwise, it will use the value 
returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=hostname routable by clients

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=port accessible by clients

# The number of threads handling network requests
num.network.threads=3
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection 
against OOM)
socket.request.max.bytes=104857600


# Log Basics #

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at 
startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs 
located in RAID array.
num.recovery.threads.per.data.dir=1

# Log Flush Policy #

# Messages are immediately written to the filesystem but by default we only 
fsync() to sync
# the OS cache lazily. The following configurations control the flush of data 
to disk. 
# There are a few important trade-offs here:
#1. Durability: Unflushed data may be lost if you are not using replication.
#2. Latency: Very large flush intervals may lead to latency spikes when the 
flush does occur as there will be a lot of data to flush.
#3. Throughput: The flush is generally the most expensive operation, and a 
small flush interval may lead to exceessive seeks. 
# The settings below allow one to configure the flush policy to flush data 
after a period of time or
# every N messages (or both). This can be done globally and overridden on a 
per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=1

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

# Log Retention Policy #

# The following configurations control the disposal of log segments. The policy 
can
# be set to delete segments after a period of time, or after a given size has 
accumulated.
# A segment will be deleted whenever *either* of these criteria are met. 
Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=1

# A size-based retention policy for logs. Segments are pruned from the log as 
long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log 
segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted 
according 
# to the retention policies
log.retention.check.interval.ms=30

# By default the log cleaner is disabled and the log retention policy will 
default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual 
logs can then be marked for log compaction.
log.cleaner.enable=false

# Zookeeper #

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each 

[jira] [Commented] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Aravind (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612
 ] 

Aravind commented on KAFKA-2078:


Hi, thanks for replying. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
# Max size of file that can be sent by a producer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

# Server Basics #

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# Socket Server Settings 
#

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all 
interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it 
uses the
# value for host.name if configured.  Otherwise, it will use the value 
returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=hostname routable by clients

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=port accessible by clients

# The number of threads handling network requests
num.network.threads=3
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection 
against OOM)
socket.request.max.bytes=104857600


# Log Basics #

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at 
startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs 
located in RAID array.
num.recovery.threads.per.data.dir=1

# Log Flush Policy #

# Messages are immediately written to the filesystem but by default we only 
fsync() to sync
# the OS cache lazily. The following configurations control the flush of data 
to disk. 
# There are a few important trade-offs here:
#1. Durability: Unflushed data may be lost if you are not using replication.
#2. Latency: Very large flush intervals may lead to latency spikes when the 
flush does occur as there will be a lot of data to flush.
#3. Throughput: The flush is generally the most expensive operation, and a 
small flush interval may lead to exceessive seeks. 
# The settings below allow one to configure the flush policy to flush data 
after a period of time or
# every N messages (or both). This can be done globally and overridden on a 
per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=1

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

# Log Retention Policy #

# The following configurations control the disposal of log segments. The policy 
can
# be set to delete segments after a period of time, or after a given size has 
accumulated.
# A segment will be deleted whenever *either* of these criteria are met. 
Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=1

# A size-based retention policy for logs. Segments are pruned from the log as 
long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log 
segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted 
according 
# to the retention policies
log.retention.check.interval.ms=30

# By default the log cleaner is disabled and the log retention policy will 
default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual 
logs can then be marked for log compaction.
log.cleaner.enable=false

# Zookeeper #

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each 

[jira] [Commented] (KAFKA-2068) Replace OffsetCommit Request/Response with org.apache.kafka.common.requests equivalent

2015-03-30 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387360#comment-14387360
 ] 

Joel Koshy commented on KAFKA-2068:
---

Yes I think we can merge KAFKA-1841 changes from 0.8.2 into trunk as part of 
this jira.

 Replace OffsetCommit Request/Response with  org.apache.kafka.common.requests  
 equivalent
 

 Key: KAFKA-2068
 URL: https://issues.apache.org/jira/browse/KAFKA-2068
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira
 Fix For: 0.8.3


 Replace OffsetCommit Request/Response with  org.apache.kafka.common.requests  
 equivalent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender

2015-03-30 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387407#comment-14387407
 ] 

Benoy Antony commented on KAFKA-2041:
-

modified _Keyer_ interface to change the function name.

 Add ability to specify a KeyClass for KafkaLog4jAppender
 

 Key: KAFKA-2041
 URL: https://issues.apache.org/jira/browse/KAFKA-2041
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao
 Attachments: kafka-2041-001.patch, kafka-2041-002.patch


 KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 
 Since there is no key or explicit partition number, the messages are sent to 
 random partitions. 
 In some cases, it is possible to derive a key from the message itself. 
 So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which 
 will provide a key for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender

2015-03-30 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated KAFKA-2077:

Attachment: kafka-2077-001.patch

Attached patch adds a new new Parameter _topicClass_ . This will accept an 
implementation of _TopicPicker_ interface.

The _TopicPicker.getTopic_ will return a topic based on the current message. 

 Add ability to specify a TopicPicker class for KafkaLog4jApender
 

 Key: KAFKA-2077
 URL: https://issues.apache.org/jira/browse/KAFKA-2077
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao
 Attachments: kafka-2077-001.patch


 KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 

 Currently , a topic name has to be passed as a parameter. In some use cases, 
 it may be required to use a different topics for the same appender instance. 

 So it may be beneficial to enable KafkaLog4jAppender to accept TopicClass 
 which will provide a topic for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387551#comment-14387551
 ] 

Sriharsha Chintalapani commented on KAFKA-2078:
---

[~aravind2015] can you attach your server.properties and producer config.

 Getting Selector [WARN] Error in I/O with host java.io.EOFException
 ---

 Key: KAFKA-2078
 URL: https://issues.apache.org/jira/browse/KAFKA-2078
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2.0
 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x 
 Intel(R) Xeon(R) CPU X5660  @ 2.80GHz/44GB
Reporter: Aravind
Assignee: Jun Rao

 When trying to Produce 1000 (10 MB) messages, getting this below error some 
 where between 997 to 1000th message. There is no pattern but able to 
 reproduce.
 [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host 
 java.io.EOFException at 
 org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
  at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at 
 org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at 
 java.lang.Thread.run(Thread.java:724)
 This error I am getting some times @ 997th message or 999th message. There is 
 no pattern but able to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387573#comment-14387573
 ] 

ASF GitHub Bot commented on KAFKA-873:
--

GitHub user atdixon opened a pull request:

https://github.com/apache/kafka/pull/53

curator + exhibitor integration

My motivation for introducing curator to kafka was to get optional 
exhibitor support, however I noticed this  is also a solution to ticket 
KAFKA-873 (https://issues.apache.org/jira/browse/KAFKA-873).

Structurally I believe the code is sound, however some tests are blocking 
which I believe is duet o races related to in-memory Zookeeper but not entirely 
sure. Am looking into it and testing outside of in-memory ZK, as well. But 
would love comments/discussion on this PR. I imagine exhibitor support is 
something that many are interested in, especially those of us in AWS  cloud 
environments.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atdixon/kafka exhibitor-support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/53.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #53


commit 3526d5664fa87b2d3f0e6e35bd5b060639df4ead
Author: Aaron Dixon atdi...@gmail.com
Date:   2015-03-30T23:15:16Z

curator + exhibitor integration




 Consider replacing zkclient with curator (with zkclient-bridge)
 ---

 Key: KAFKA-873
 URL: https://issues.apache.org/jira/browse/KAFKA-873
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Scott Clasen
Assignee: Grant Henke

 If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
 be initially a drop-in replacement
 https://github.com/Netflix/curator/wiki/ZKClient-Bridge
 With the addition of a few more props to ZkConfig, and a bit of code this 
 would open up the possibility of using ACLs in zookeeper (which arent 
 supported directly by zkclient), as well as integrating with netflix 
 exhibitor for those of us using that.
 Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: curator + exhibitor integration

2015-03-30 Thread atdixon
GitHub user atdixon opened a pull request:

https://github.com/apache/kafka/pull/53

curator + exhibitor integration

My motivation for introducing curator to kafka was to get optional 
exhibitor support, however I noticed this  is also a solution to ticket 
KAFKA-873 (https://issues.apache.org/jira/browse/KAFKA-873).

Structurally I believe the code is sound, however some tests are blocking 
which I believe is duet o races related to in-memory Zookeeper but not entirely 
sure. Am looking into it and testing outside of in-memory ZK, as well. But 
would love comments/discussion on this PR. I imagine exhibitor support is 
something that many are interested in, especially those of us in AWS  cloud 
environments.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/atdixon/kafka exhibitor-support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/53.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #53


commit 3526d5664fa87b2d3f0e6e35bd5b060639df4ead
Author: Aaron Dixon atdi...@gmail.com
Date:   2015-03-30T23:15:16Z

curator + exhibitor integration




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation

2015-03-30 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy resolved KAFKA-1634.
---
Resolution: Fixed

Yes I think that makes more sense

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, 
 KAFKA-1634_2014-12-01_18:03:12.patch, KAFKA-1634_2015-01-14_15:50:15.patch, 
 KAFKA-1634_2015-01-21_16:43:01.patch, KAFKA-1634_2015-01-22_18:47:37.patch, 
 KAFKA-1634_2015-01-23_16:06:07.patch, KAFKA-1634_2015-02-06_11:01:08.patch, 
 KAFKA-1634_2015-03-26_12:16:09.patch, KAFKA-1634_2015-03-26_12:27:18.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-03-30 Thread Aaron Dixon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387303#comment-14387303
 ] 

Aaron Dixon edited comment on KAFKA-873 at 3/30/15 8:48 PM:


I recently worked with the curator-x-zkclient-bridge team to get a new release 
(3.0.0) out so that I could integrate curator with Kafka. I have a pull request 
coming soon to the Kafka project with this integration.


was (Author: atdixon):
I've recently worked zkclient bridge to get a new release out so that I could 
integrate curator with Kafka. I have a pull request coming soon to the Kafka 
project with this integration.

 Consider replacing zkclient with curator (with zkclient-bridge)
 ---

 Key: KAFKA-873
 URL: https://issues.apache.org/jira/browse/KAFKA-873
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Scott Clasen
Assignee: Grant Henke

 If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
 be initially a drop-in replacement
 https://github.com/Netflix/curator/wiki/ZKClient-Bridge
 With the addition of a few more props to ZkConfig, and a bit of code this 
 would open up the possibility of using ACLs in zookeeper (which arent 
 supported directly by zkclient), as well as integrating with netflix 
 exhibitor for those of us using that.
 Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Aravind (JIRA)
Aravind created KAFKA-2078:
--

 Summary: Getting Selector [WARN] Error in I/O with host 
java.io.EOFException
 Key: KAFKA-2078
 URL: https://issues.apache.org/jira/browse/KAFKA-2078
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2.0
 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x 
Intel(R) Xeon(R) CPU X5660  @ 2.80GHz/44GB
Reporter: Aravind
Assignee: Jun Rao


When trying to Produce 1000 (10 MB) messages, getting this below error some 
where between 997 to 1000th message. There is no pattern but able to reproduce.

[PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host 
java.io.EOFException at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62) 
at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at 
java.lang.Thread.run(Thread.java:724)

This error I am getting some times @ 997th message or 999th message. There is 
no pattern but able to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-873) Consider replacing zkclient with curator (with zkclient-bridge)

2015-03-30 Thread Aaron Dixon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387578#comment-14387578
 ] 

Aaron Dixon commented on KAFKA-873:
---

As promised, here is a PR integrating curator (+ optional exhibitor support) 
into kafka... https://github.com/apache/kafka/pull/53.

Some of the automated tests are hanging, I believe due to races related to 
embedded zookeeper usage but am not entirely sure. 

 Consider replacing zkclient with curator (with zkclient-bridge)
 ---

 Key: KAFKA-873
 URL: https://issues.apache.org/jira/browse/KAFKA-873
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Scott Clasen
Assignee: Grant Henke

 If zkclient was replaced with curator and curator-x-zkclient-bridge it would 
 be initially a drop-in replacement
 https://github.com/Netflix/curator/wiki/ZKClient-Bridge
 With the addition of a few more props to ZkConfig, and a bit of code this 
 would open up the possibility of using ACLs in zookeeper (which arent 
 supported directly by zkclient), as well as integrating with netflix 
 exhibitor for those of us using that.
 Looks like KafkaZookeeperClient needs some love anyhow...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Aravind (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612
 ] 

Aravind edited comment on KAFKA-2078 at 3/30/15 11:44 PM:
--

Hi, thanks for replying. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

broker.id=0
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=1
log.segment.bytes=1073741824
log.retention.check.interval.ms=30
log.cleaner.enable=false
zookeeper.connect=host1:2181
zookeeper.connection.timeout.ms=6000
message.max.bytes=104857600
replica.fetch.max.bytes=104857600


Thanks,
Aravind


was (Author: aravind2015):
Hi, thanks for replying. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

# Server Basics #

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# Socket Server Settings 
#

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all 
interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it 
uses the
# value for host.name if configured.  Otherwise, it will use the value 
returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=hostname routable by clients

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=port accessible by clients

# The number of threads handling network requests
num.network.threads=3
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection 
against OOM)
socket.request.max.bytes=104857600


# Log Basics #

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at 
startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs 
located in RAID array.
num.recovery.threads.per.data.dir=1

# Log Flush Policy #

# Messages are immediately written to the filesystem but by default we only 
fsync() to sync
# the OS cache lazily. The following configurations control the flush of data 
to disk. 
# There are a few important trade-offs here:
#1. Durability: Unflushed data may be lost if you are not using replication.
#2. Latency: Very large flush intervals may lead to latency spikes when the 
flush does occur as there will be a lot of data to flush.
#3. Throughput: The flush is generally the most expensive operation, and a 
small flush interval may lead to exceessive seeks. 
# The settings below allow one to configure the flush policy to flush data 
after a period of time or
# every N messages (or both). This can be done globally and overridden on a 
per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=1

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

# Log Retention Policy #

# The following configurations control the disposal of log segments. The policy 
can
# be set to delete segments after a period of time, or after a given size has 
accumulated.
# A segment will be deleted whenever *either* of these criteria are met. 
Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=1

# A size-based retention policy for logs. Segments are pruned from the log as 
long as the remaining
# segments don't drop below 

[jira] [Comment Edited] (KAFKA-2078) Getting Selector [WARN] Error in I/O with host java.io.EOFException

2015-03-30 Thread Aravind (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387612#comment-14387612
 ] 

Aravind edited comment on KAFKA-2078 at 3/30/15 11:44 PM:
--

Hi Harsha, thanks for reply. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

broker.id=0
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=1
log.segment.bytes=1073741824
log.retention.check.interval.ms=30
log.cleaner.enable=false
zookeeper.connect=host1:2181
zookeeper.connection.timeout.ms=6000
message.max.bytes=104857600
replica.fetch.max.bytes=104857600


Thanks,
Aravind


was (Author: aravind2015):
Hi, thanks for replying. below are the details:

Producer config:

kafka.compression.codec=gzip
producer.metadata.broker.list=host:9092
producer.logger=KafkaProducer
producer.max.request.size=1073741824
producer.batch.num.messages=50
producer.queue.buffering.max.ms=5000


Server.properties:

broker.id=0
port=9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=1
log.segment.bytes=1073741824
log.retention.check.interval.ms=30
log.cleaner.enable=false
zookeeper.connect=host1:2181
zookeeper.connection.timeout.ms=6000
message.max.bytes=104857600
replica.fetch.max.bytes=104857600


Thanks,
Aravind

 Getting Selector [WARN] Error in I/O with host java.io.EOFException
 ---

 Key: KAFKA-2078
 URL: https://issues.apache.org/jira/browse/KAFKA-2078
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2.0
 Environment: OS Version: 2.6.39-400.209.1.el5uek and Hardware: 8 x 
 Intel(R) Xeon(R) CPU X5660  @ 2.80GHz/44GB
Reporter: Aravind
Assignee: Jun Rao

 When trying to Produce 1000 (10 MB) messages, getting this below error some 
 where between 997 to 1000th message. There is no pattern but able to 
 reproduce.
 [PDT] 2015-03-31 13:53:50 Selector [WARN] Error in I/O with our host 
 java.io.EOFException at 
 org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
  at org.apache.kafka.common.network.Selector.poll(Selector.java:248) at 
 org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) at 
 java.lang.Thread.run(Thread.java:724)
 This error I am getting some times @ 997th message or 999th message. There is 
 no pattern but able to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2041) Add ability to specify a KeyClass for KafkaLog4jAppender

2015-03-30 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated KAFKA-2041:

Attachment: kafka-2041-002.patch

 Add ability to specify a KeyClass for KafkaLog4jAppender
 

 Key: KAFKA-2041
 URL: https://issues.apache.org/jira/browse/KAFKA-2041
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao
 Attachments: kafka-2041-001.patch, kafka-2041-002.patch


 KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 
 Since there is no key or explicit partition number, the messages are sent to 
 random partitions. 
 In some cases, it is possible to derive a key from the message itself. 
 So it may be beneficial to enable KafkaLog4jAppender to accept KeyClass which 
 will provide a key for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1716) hang during shutdown of ZookeeperConsumerConnector

2015-03-30 Thread David Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387543#comment-14387543
 ] 

David Chu commented on KAFKA-1716:
--

Yes, from what I can tell, if I just start up my consumer application which 
creates 8 instances of {{ZookeeperConsumerConnector}} with 1 
{{ConsumerIterator}} being created from each {{ZookeeperConsumerConnector}} 
like the following:
{code}
this.msgIterator = 
consumerConnector.createMessageStreamsByFilter(topicWhiteList, 
/*numThreads*/1).get(0).iterator();
{code}
and there are no topics existing in the Kafka brokers, then the consumer 
fetcher threads will not be created.  Also, in my setup I do have the 
{{auto.create.topics.enable}} property set to {{true}} but it doesn't look like 
topics are created when I startup the consumers, however, if I publish messages 
to these topics they do get created.  Also, to verify the existence of the 
topics I'm just looking to see if they have entries under the Zookeeper path 
{{kafka/brokers/topics}}.

From what I can tell, it looks like the ConsumerFetcherThread is created from 
the {{LeaderFinderThread.createFetcherThread}} method which is called from the 
{{AbstractFetcherManager.addFetcherForPartitions}}  method which is called 
from the {{LeaderFinderThread.doWork}} method.  In my case it appears that the 
{{AbstractFetcherManager.addFetcherForPartitions}} method is never being 
called from the {{LeaderFinderThread.doWork}} method due to the following:
# When the {{LeaderFinderThread.doWork}} method is called the 
{{ConsumerFetcherManager.noLeaderPartitionSet}} field is empty so the thread 
ends up calling {{cond.await()}} on line 61.
# The thread then throws an exception (I can't see the actual exception but my 
guess is it's an {{InterruptedException}}) so it ends up in the {{catch}} block 
on line 84. 
# At this point the {{LeaderFinderThread.isRunning}} field is {{false}} so it 
ends up throwing the exception again on line 86.
# Therefore, the {{addFetcherForPartition}} method on line 95 is never called 
and the ConsumerFetcherThread is never created. 

 hang during shutdown of ZookeeperConsumerConnector
 --

 Key: KAFKA-1716
 URL: https://issues.apache.org/jira/browse/KAFKA-1716
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Sean Fay
Assignee: Neha Narkhede
 Attachments: after-shutdown.log, before-shutdown.log, 
 kafka-shutdown-stuck.log


 It appears to be possible for {{ZookeeperConsumerConnector.shutdown()}} to 
 wedge in the case that some consumer fetcher threads receive messages during 
 the shutdown process.
 Shutdown thread:
 {code}-- Parking to wait for: 
 java/util/concurrent/CountDownLatch$Sync@0x2aaaf3ef06d0
 at jrockit/vm/Locks.park0(J)V(Native Method)
 at jrockit/vm/Locks.park(Locks.java:2230)
 at sun/misc/Unsafe.park(ZJ)V(Native Method)
 at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:207)
 at kafka/utils/ShutdownableThread.shutdown(ShutdownableThread.scala:36)
 at 
 kafka/server/AbstractFetcherThread.shutdown(AbstractFetcherThread.scala:71)
 at 
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:121)
 at 
 kafka/server/AbstractFetcherManager$$anonfun$closeAllFetchers$2.apply(AbstractFetcherManager.scala:120)
 at 
 scala/collection/TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at 
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala/collection/mutable/HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
 at 
 scala/collection/mutable/HashTable$class.foreachEntry(HashTable.scala:226)
 at scala/collection/mutable/HashMap.foreachEntry(HashMap.scala:39)
 at scala/collection/mutable/HashMap.foreach(HashMap.scala:98)
 at 
 scala/collection/TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at 
 kafka/server/AbstractFetcherManager.closeAllFetchers(AbstractFetcherManager.scala:120)
 ^-- Holding lock: java/lang/Object@0x2aaaebcc7318[thin lock]
 at 
 kafka/consumer/ConsumerFetcherManager.stopConnections(ConsumerFetcherManager.scala:148)
 at 
 kafka/consumer/ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:171)
 at 
 

[jira] [Created] (KAFKA-2079) Support exhibitor

2015-03-30 Thread Aaron Dixon (JIRA)
Aaron Dixon created KAFKA-2079:
--

 Summary: Support exhibitor
 Key: KAFKA-2079
 URL: https://issues.apache.org/jira/browse/KAFKA-2079
 Project: Kafka
  Issue Type: Improvement
Reporter: Aaron Dixon


Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring 
solution for managing Zookeeper clusters. It supports use cases like discovery, 
node replacements and auto-scaling of Zk cluster hosts (so you don't have to 
manage a fixed set of Zk hosts--especially useful in cloud environments.)

The easiest way for Kafka to support connection to Zk clusters via exhibitor is 
to use curator as its client. There is already a separate ticket for this: 
KAFKA-873



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 32650: Patch for KAFKA-2000

2015-03-30 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32650/
---

Review request for kafka.


Bugs: KAFKA-2000
https://issues.apache.org/jira/browse/KAFKA-2000


Repository: kafka


Description
---

KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted.


Diffs
-

  core/src/main/scala/kafka/server/OffsetManager.scala 
395b1dbe43a5db47151e72a1b588d72f03cef963 

Diff: https://reviews.apache.org/r/32650/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-2000:
--
Attachment: KAFKA-2000.patch

 Delete consumer offsets from kafka once the topic is deleted
 

 Key: KAFKA-2000
 URL: https://issues.apache.org/jira/browse/KAFKA-2000
 Project: Kafka
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Attachments: KAFKA-2000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-2000:
--
Reviewer: Joel Koshy

 Delete consumer offsets from kafka once the topic is deleted
 

 Key: KAFKA-2000
 URL: https://issues.apache.org/jira/browse/KAFKA-2000
 Project: Kafka
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Attachments: KAFKA-2000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-2000:
--
Status: Patch Available  (was: Open)

 Delete consumer offsets from kafka once the topic is deleted
 

 Key: KAFKA-2000
 URL: https://issues.apache.org/jira/browse/KAFKA-2000
 Project: Kafka
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Attachments: KAFKA-2000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2000) Delete consumer offsets from kafka once the topic is deleted

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387479#comment-14387479
 ] 

Sriharsha Chintalapani commented on KAFKA-2000:
---

Created reviewboard https://reviews.apache.org/r/32650/diff/
 against branch origin/trunk

 Delete consumer offsets from kafka once the topic is deleted
 

 Key: KAFKA-2000
 URL: https://issues.apache.org/jira/browse/KAFKA-2000
 Project: Kafka
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Attachments: KAFKA-2000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Metrics package discussion

2015-03-30 Thread Aditya Auradkar
If we do plan to use the network code in client, I think that is a good reason 
in favor of migration. It will be unnecessary to have metrics from multiple 
libraries coexist since our users will have to start monitoring these new 
metrics anyway.

I also agree with Jay that in multi-tenant clusters people care about detailed 
statistics for their own application over global numbers. 

Based on the arguments so far, I'm +1 for migrating to KM.

Thanks,
Aditya


From: Jun Rao [j...@confluent.io]
Sent: Sunday, March 29, 2015 9:44 AM
To: dev@kafka.apache.org
Subject: Re: Metrics package discussion

There is another thing to consider. We plan to reuse the client components
on the server side over time. For example, as part of the security work, we
are looking into replacing the server side network code with the client
network code (KAFKA-1928). However, the client network already has metrics
based on KM.

Thanks,

Jun

On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote:

 I think Joel's summary is good.

 I'll add a few more points:

 As discussed memory matter a lot if we want to be able to give percentiles
 at the client or topic level, in which case we will have thousands of them.
 If we just do histograms at the global level then it is not a concern. The
 argument for doing histograms at the client and topic level is that
 averages are often very misleading, especially for latency information or
 other asymmetric distributions. Most people who care about this kind of
 thing would say the same. If you are a user of a multi-tenant cluster then
 you probably care a lot more about stats for your application or your topic
 rather than the global, so it could be nice to have histograms for these. I
 don't feel super strongly about this.

 The ExponentiallyDecayingSample is internally
 a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of
 about 64 bytes per entry. So a 1000 element sample is 64KB. For global
 metrics this is fine, but for granular metrics not workable.

 Two other issues I'm not sure about:

 1. Is there a way to get metric descriptions into the coda hale JMX output?
 One of the really nicest practical things about the new client metrics is
 that if you look at them in jconsole each metric has an associated
 description that explains what it means. I think this is a nice usability
 thing--it is really hard to know what to make of the current metrics
 without this kind of documentation and keeping separate docs up-to-date is
 really hard and even if you do it most people won't find it.

 2. I'm not clear if the sample decay in the histogram is actually the same
 as for the other stats. It seems like it isn't but this would make
 interpretation quite difficult. In other words if I have N metrics
 including some Histograms some Meters, etc are all these measurements all
 taken over the same time window? I actually think they are not, it looks
 like there are different sampling methodologies across. So this means if
 you have a dashboard that plots these things side by side the measurement
 at a given point in time is not actually comparable across multiple stats.
 Am I confused about this?

 -Jay


 On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote:

  For the samples: it will be at least double that estimate I think
  since the long array contains (eight byte) references to the actual
  longs, each of which also have some object overhead.
 
  Re: testing: actually, it looks like YM metrics does allow you to
  drop in your own clock:
 
 
 https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java
 
 
 https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36
 
  Not sure if it was mentioned in this (or some recent) thread but a
  major motivation in the kafka-common metrics (KM) was absorbing API
  changes and even mbean naming conventions. For e.g., in the early
  stages of 0.8 we picked up YM metrics 3.x but collided with client
  apps at LinkedIn which were still on 2.x. We ended up changing our
  code to use 2.x in the end. Having our own metrics package makes us
  less vulnerable to these kinds of changes. The multiple version
  collision problem is obviously less of an issue with the broker but we
  are still exposed to possible metric changes in YM metrics.
 
  I'm wondering if we need to weigh too much toward the memory overheads
  of histograms in making a decision here simply because I don't think
  we have found them to be an extreme necessity for
  per-clientid/per-partition metrics and they are more critical for
  aggregate (global) metrics.
 
  So it seems the main benefits of switching to KM metrics are:
  - Less exposure to YM metrics changes
  - More control over the actual implementation. E.g., there is
considerable research on implementing approximate-but-good-enough

[jira] [Commented] (KAFKA-2079) Support exhibitor

2015-03-30 Thread Aaron Dixon (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387592#comment-14387592
 ] 

Aaron Dixon commented on KAFKA-2079:


A proposed solution: https://github.com/apache/kafka/pull/53

 Support exhibitor
 -

 Key: KAFKA-2079
 URL: https://issues.apache.org/jira/browse/KAFKA-2079
 Project: Kafka
  Issue Type: Improvement
Reporter: Aaron Dixon

 Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring 
 solution for managing Zookeeper clusters. It supports use cases like 
 discovery, node replacements and auto-scaling of Zk cluster hosts (so you 
 don't have to manage a fixed set of Zk hosts--especially useful in cloud 
 environments.)
 The easiest way for Kafka to support connection to Zk clusters via exhibitor 
 is to use curator as its client. There is already a separate ticket for this: 
 KAFKA-873



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2077) Add ability to specify a TopicPicker class for KafkaLog4jApender

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387690#comment-14387690
 ] 

Sriharsha Chintalapani commented on KAFKA-2077:
---

[~benoyantony] It will be helpful if you can send the patch using 
kafka-patch-review tool here are the instructions 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+patch+review+tool

 Add ability to specify a TopicPicker class for KafkaLog4jApender
 

 Key: KAFKA-2077
 URL: https://issues.apache.org/jira/browse/KAFKA-2077
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Benoy Antony
Assignee: Jun Rao
 Attachments: kafka-2077-001.patch


 KafkaLog4jAppender is the Log4j Appender to publish messages to Kafka. 

 Currently , a topic name has to be passed as a parameter. In some use cases, 
 it may be required to use a different topics for the same appender instance. 

 So it may be beneficial to enable KafkaLog4jAppender to accept TopicClass 
 which will provide a topic for a given message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32650: Patch for KAFKA-2000

2015-03-30 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32650/#review78310
---

Ship it!


Very nice fix :)

- Gwen Shapira


On March 30, 2015, 9:47 p.m., Sriharsha Chintalapani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32650/
 ---
 
 (Updated March 30, 2015, 9:47 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2000
 https://issues.apache.org/jira/browse/KAFKA-2000
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/OffsetManager.scala 
 395b1dbe43a5db47151e72a1b588d72f03cef963 
 
 Diff: https://reviews.apache.org/r/32650/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sriharsha Chintalapani
 




Re: [DISCUSSION] Keep docs updated per jira

2015-03-30 Thread Joel Koshy
Also for the wikis - those should probably correspond to the latest
released version right? So for e.g., if we add or modify the protocol
on trunk we can add it to the wiki but mark it with some highlight or
similar just to make it clear that it is a change on trunk only.

Thanks,

Joel

On Thu, Mar 26, 2015 at 06:27:25PM -0700, Jun Rao wrote:
 Hi, Everyone,
 
 Quite a few jiras these days require documentation changes (e.g., wire
 protocol, ZK layout, configs, jmx, etc). Historically, we have been
 updating the documentation just before we do a release. The issue is that
 some of the changes will be missed since they were done a while back.
 Another way to do that is to keep the docs updated as we complete each
 jira. Currently, our documentations are in the following places.
 
 wire protocol:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
 ZK layout:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
 configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083
 
 We probably don't need to update configs already ported to ConfigDef since
 they can be generated automatically. However, for the rest of the doc
 related changes, keeping they updated per jira seems a better approach.
 What do people think?
 
 Thanks,
 
 Jun



Re: Metrics package discussion

2015-03-30 Thread Jun Rao
If we are committed to migrating the broker side metrics to KM for the next
release, we will need to (1) have a story on supporting common reporters
(as listed in KAFKA-1930), and (2) see if the current histogram support is
good enough for measuring things like request time.

Thanks,

Jun

On Mon, Mar 30, 2015 at 3:03 PM, Aditya Auradkar 
aaurad...@linkedin.com.invalid wrote:

 If we do plan to use the network code in client, I think that is a good
 reason in favor of migration. It will be unnecessary to have metrics from
 multiple libraries coexist since our users will have to start monitoring
 these new metrics anyway.

 I also agree with Jay that in multi-tenant clusters people care about
 detailed statistics for their own application over global numbers.

 Based on the arguments so far, I'm +1 for migrating to KM.

 Thanks,
 Aditya

 
 From: Jun Rao [j...@confluent.io]
 Sent: Sunday, March 29, 2015 9:44 AM
 To: dev@kafka.apache.org
 Subject: Re: Metrics package discussion

 There is another thing to consider. We plan to reuse the client components
 on the server side over time. For example, as part of the security work, we
 are looking into replacing the server side network code with the client
 network code (KAFKA-1928). However, the client network already has metrics
 based on KM.

 Thanks,

 Jun

 On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote:

  I think Joel's summary is good.
 
  I'll add a few more points:
 
  As discussed memory matter a lot if we want to be able to give
 percentiles
  at the client or topic level, in which case we will have thousands of
 them.
  If we just do histograms at the global level then it is not a concern.
 The
  argument for doing histograms at the client and topic level is that
  averages are often very misleading, especially for latency information or
  other asymmetric distributions. Most people who care about this kind of
  thing would say the same. If you are a user of a multi-tenant cluster
 then
  you probably care a lot more about stats for your application or your
 topic
  rather than the global, so it could be nice to have histograms for
 these. I
  don't feel super strongly about this.
 
  The ExponentiallyDecayingSample is internally
  a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of
  about 64 bytes per entry. So a 1000 element sample is 64KB. For global
  metrics this is fine, but for granular metrics not workable.
 
  Two other issues I'm not sure about:
 
  1. Is there a way to get metric descriptions into the coda hale JMX
 output?
  One of the really nicest practical things about the new client metrics is
  that if you look at them in jconsole each metric has an associated
  description that explains what it means. I think this is a nice usability
  thing--it is really hard to know what to make of the current metrics
  without this kind of documentation and keeping separate docs up-to-date
 is
  really hard and even if you do it most people won't find it.
 
  2. I'm not clear if the sample decay in the histogram is actually the
 same
  as for the other stats. It seems like it isn't but this would make
  interpretation quite difficult. In other words if I have N metrics
  including some Histograms some Meters, etc are all these measurements all
  taken over the same time window? I actually think they are not, it looks
  like there are different sampling methodologies across. So this means if
  you have a dashboard that plots these things side by side the measurement
  at a given point in time is not actually comparable across multiple
 stats.
  Am I confused about this?
 
  -Jay
 
 
  On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote:
 
   For the samples: it will be at least double that estimate I think
   since the long array contains (eight byte) references to the actual
   longs, each of which also have some object overhead.
  
   Re: testing: actually, it looks like YM metrics does allow you to
   drop in your own clock:
  
  
 
 https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Clock.java
  
  
 
 https://github.com/dropwizard/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/Meter.java#L36
  
   Not sure if it was mentioned in this (or some recent) thread but a
   major motivation in the kafka-common metrics (KM) was absorbing API
   changes and even mbean naming conventions. For e.g., in the early
   stages of 0.8 we picked up YM metrics 3.x but collided with client
   apps at LinkedIn which were still on 2.x. We ended up changing our
   code to use 2.x in the end. Having our own metrics package makes us
   less vulnerable to these kinds of changes. The multiple version
   collision problem is obviously less of an issue with the broker but we
   are still exposed to possible metric changes in YM metrics.
  
   I'm wondering if we need to weigh too much toward the memory 

[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-30 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387828#comment-14387828
 ] 

Sriharsha Chintalapani commented on KAFKA-1646:
---

[~jkreps] Any guidance on the latest patch?

 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
Assignee: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
 KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch, 
 KAFKA-1646_20150312_200352.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-1910) Refactor KafkaConsumer

2015-03-30 Thread Joel Koshy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Koshy reopened KAFKA-1910:
---

Reopening for the doc update and clarification on the change in error code.

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.3

 Attachments: KAFKA-1910.patch, KAFKA-1910.patch, 
 KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-30 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387652#comment-14387652
 ] 

Joel Koshy commented on KAFKA-1910:
---

[~guozhang] this patch modified some of the behavior of OffsetFetchRequest 
which [~toddpalino] caught while using a python client against the latest 
broker (on trunk)

Previously, we would return NoError with an invalid offset if there was no 
committed offset. Now it returns the NoOffsetsCommittedCode. Was there a 
discussion on this? Either way, we should also update the protocol 
documentation if this change sticks.

 Refactor KafkaConsumer
 --

 Key: KAFKA-1910
 URL: https://issues.apache.org/jira/browse/KAFKA-1910
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.3

 Attachments: KAFKA-1910.patch, KAFKA-1910.patch, 
 KAFKA-1910_2015-03-05_14:55:33.patch


 KafkaConsumer now contains all the logic on the consumer side, making it a 
 very huge class file, better re-factoring it to have multiple layers on top 
 of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Metrics package discussion

2015-03-30 Thread Gwen Shapira
(1) It will be interesting to see what others use for monitoring
integration, to see what is already covered with existing JMX
integrations and what needs special support.

(2) I think the migration story is more important - this is a
non-compatible change, right? So we can't do it in 0.8.3 timeframe, it
has to be in 0.9? And we need to figure out how will users migrate -
do we just tell everyone please reconfigure all your monitors from
scratch - don't worry, it is worth it?
I know you keep saying we did it before and our users are used to it,
but I think there are a lot more users now, and some of them have
different compatibility expectations. We probably need to find:
* A least painful way to migrate - can we keep the names of at least
most of the metrics intact?
* Good explanation of what users gain from this painful migration
(i.e. more accurate statistics due to gazillion histograms)






On Mon, Mar 30, 2015 at 6:29 PM, Jun Rao j...@confluent.io wrote:
 If we are committed to migrating the broker side metrics to KM for the next
 release, we will need to (1) have a story on supporting common reporters
 (as listed in KAFKA-1930), and (2) see if the current histogram support is
 good enough for measuring things like request time.

 Thanks,

 Jun

 On Mon, Mar 30, 2015 at 3:03 PM, Aditya Auradkar 
 aaurad...@linkedin.com.invalid wrote:

 If we do plan to use the network code in client, I think that is a good
 reason in favor of migration. It will be unnecessary to have metrics from
 multiple libraries coexist since our users will have to start monitoring
 these new metrics anyway.

 I also agree with Jay that in multi-tenant clusters people care about
 detailed statistics for their own application over global numbers.

 Based on the arguments so far, I'm +1 for migrating to KM.

 Thanks,
 Aditya

 
 From: Jun Rao [j...@confluent.io]
 Sent: Sunday, March 29, 2015 9:44 AM
 To: dev@kafka.apache.org
 Subject: Re: Metrics package discussion

 There is another thing to consider. We plan to reuse the client components
 on the server side over time. For example, as part of the security work, we
 are looking into replacing the server side network code with the client
 network code (KAFKA-1928). However, the client network already has metrics
 based on KM.

 Thanks,

 Jun

 On Sat, Mar 28, 2015 at 1:34 PM, Jay Kreps jay.kr...@gmail.com wrote:

  I think Joel's summary is good.
 
  I'll add a few more points:
 
  As discussed memory matter a lot if we want to be able to give
 percentiles
  at the client or topic level, in which case we will have thousands of
 them.
  If we just do histograms at the global level then it is not a concern.
 The
  argument for doing histograms at the client and topic level is that
  averages are often very misleading, especially for latency information or
  other asymmetric distributions. Most people who care about this kind of
  thing would say the same. If you are a user of a multi-tenant cluster
 then
  you probably care a lot more about stats for your application or your
 topic
  rather than the global, so it could be nice to have histograms for
 these. I
  don't feel super strongly about this.
 
  The ExponentiallyDecayingSample is internally
  a ConcurrentSkipListMapDouble, Long. This seems to have an overhead of
  about 64 bytes per entry. So a 1000 element sample is 64KB. For global
  metrics this is fine, but for granular metrics not workable.
 
  Two other issues I'm not sure about:
 
  1. Is there a way to get metric descriptions into the coda hale JMX
 output?
  One of the really nicest practical things about the new client metrics is
  that if you look at them in jconsole each metric has an associated
  description that explains what it means. I think this is a nice usability
  thing--it is really hard to know what to make of the current metrics
  without this kind of documentation and keeping separate docs up-to-date
 is
  really hard and even if you do it most people won't find it.
 
  2. I'm not clear if the sample decay in the histogram is actually the
 same
  as for the other stats. It seems like it isn't but this would make
  interpretation quite difficult. In other words if I have N metrics
  including some Histograms some Meters, etc are all these measurements all
  taken over the same time window? I actually think they are not, it looks
  like there are different sampling methodologies across. So this means if
  you have a dashboard that plots these things side by side the measurement
  at a given point in time is not actually comparable across multiple
 stats.
  Am I confused about this?
 
  -Jay
 
 
  On Fri, Mar 27, 2015 at 6:27 PM, Joel Koshy jjkosh...@gmail.com wrote:
 
   For the samples: it will be at least double that estimate I think
   since the long array contains (eight byte) references to the actual
   longs, each of which also have some object overhead.
  
   Re: testing: actually, it looks like YM metrics 

Re: Review Request 32650: Patch for KAFKA-2000

2015-03-30 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32650/#review78337
---


This might be out of scope for this JIRA, but I think if the deleted topic gets 
recreated before compaction, the offsets corresponding to the older version of 
the topic won't be deleted.

This usually doesn't matter because auto.offset.reset will be triggered if the 
new version of the topic is smaller than the old version in terms of offsets. 
As with delete topic from zookeeper-based offsets, there's the edge case of the 
consumer skipping messages from the new version of the topic if old version's 
offsets still fit. This edge case was briefly discussed here: 
https://issues.apache.org/jira/browse/KAFKA-1787

- Onur Karaman


On March 30, 2015, 9:47 p.m., Sriharsha Chintalapani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32650/
 ---
 
 (Updated March 30, 2015, 9:47 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2000
 https://issues.apache.org/jira/browse/KAFKA-2000
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-2000. Delete consumer offsets from kafka once the topic is deleted.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/OffsetManager.scala 
 395b1dbe43a5db47151e72a1b588d72f03cef963 
 
 Diff: https://reviews.apache.org/r/32650/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sriharsha Chintalapani
 




[jira] [Commented] (KAFKA-2059) ZookeeperConsumerConnectorTest.testBasic trasient failure

2015-03-30 Thread Fangmin Lv (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388046#comment-14388046
 ] 

Fangmin Lv commented on KAFKA-2059:
---

Hi Guozhang,

Is this failure repeatable? How often, occasionally or always failed on your 
environment?

I've run this testcase for an hour and haven't seen any failure (run based on 
SHA c5df2a8) :
$ while ./gradlew --rerun-tasks -Dtest.single=ZookeeperConsumerConnectorTest 
core:test; do echo .; done

Thanks,
Fangmin

 ZookeeperConsumerConnectorTest.testBasic trasient failure
 -

 Key: KAFKA-2059
 URL: https://issues.apache.org/jira/browse/KAFKA-2059
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
  Labels: newbie

 {code}
 kafka.javaapi.consumer.ZookeeperConsumerConnectorTest  testBasic FAILED
 kafka.common.InconsistentBrokerIdException: Configured brokerId 1 doesn't 
 match stored brokerId 0 in meta.properties
 at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:443)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:116)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:126)
 at 
 kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:57)
 at 
 kafka.integration.KafkaServerTestHarness$$anonfun$setUp$1.apply(KafkaServerTestHarness.scala:57)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at 
 kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:57)
 at 
 kafka.javaapi.consumer.ZookeeperConsumerConnectorTest.setUp(ZookeeperConsumerConnectorTest.scala:41)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)