date:20150304


 [ 
https://issues.apache.org/jira/browse/KAFKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1863:
-
Status: Patch Available  (was: Open)

 Exception categories / hierarchy in clients
 ---

 Key: KAFKA-1863
 URL: https://issues.apache.org/jira/browse/KAFKA-1863
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1863.patch


 In the new clients package we introduces a new set of exceptions, but its 
 hierarchy is not very clear as of today:
 {code}
 RuntimeException - KafkaException - BufferExhastedException
- ConfigException
- 
 SerializationException
- 
 QuotaViolationException
- SchemaException
- ApiException
 ApiException - InvalidTopicException
  - OffsetMetadataTooLarge (probabaly need to be renamed)
  - RecordBatchTooLargeException
  - RecordTooLargeException
  - UnknownServerException
  - RetriableException
 RetriableException - CorruptRecordException
- InvalidMetadataException
- NotEnoughtReplicasAfterAppendException
- NotEnoughReplicasException
- OffsetOutOfRangeException
- TimeoutException
- UnknownTopicOrPartitionException
 {code}
 KafkaProducer.send() may throw KafkaExceptions that are not ApiExceptions; 
 other exceptions will be set in the returned future metadata.
 We need better to
 1. Re-examine the hierarchy. For example, for producers only exceptions that 
 are thrown directly from the caller thread before it is appended to the batch 
 buffer should be ApiExceptions; some exceptions could be renamed / merged.
 2. Clearly document the exception category / hierarchy as part of the release.
 [~criccomini] may have some more feedbacks for this issue from Samza's usage 
 experience. [~jkreps]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 31735: Fix KAFKA-1863

2015-03-04 Thread Guozhang Wang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31735/
---

Review request for kafka.


Bugs: KAFKA-1863
https://issues.apache.org/jira/browse/KAFKA-1863


Repository: kafka


Description
---

Add docs for ApiExceptions in callbacks and send / construtors


Diffs
-

  clients/src/main/java/org/apache/kafka/clients/producer/Callback.java 
b89aa582f64e8aa38e75cef1d153760400df335e 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
ed9c63a6679e3aaf83d19fde19268553a4c107c2 
  clients/src/main/java/org/apache/kafka/common/protocol/types/Field.java 
899195819159a3544e14ebfb09aff1b9152ff5dd 
  clients/src/main/java/org/apache/kafka/common/protocol/types/Schema.java 
716470125866641210591df1d5c773183949819a 
  
clients/src/main/java/org/apache/kafka/common/protocol/types/SchemaException.java
 ea4e46f4d768ef90a0d725ad117aab4645f6cb76 
  clients/src/main/java/org/apache/kafka/common/protocol/types/Struct.java 
ff89f0e37d5fa787b0218eff86d169aaeae2107b 
  clients/src/main/java/org/apache/kafka/common/protocol/types/Type.java 
f0d5a8286380d0138b7a815ea7492311603cc72f 

Diff: https://reviews.apache.org/r/31735/diff/


Testing
---


Thanks,

Guozhang Wang

[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies


 [ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhiro Matsuda updated KAFKA-527:
---
Attachment: KAFKA-527.patch

 Compression support does numerous byte copies
 -

 Key: KAFKA-527
 URL: https://issues.apache.org/jira/browse/KAFKA-527
 Project: Kafka
  Issue Type: Bug
  Components: compression
Reporter: Jay Kreps
Priority: Critical
 Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
 java.hprof.no-compression.txt, java.hprof.snappy.text


 The data path for compressing or decompressing messages is extremely 
 inefficient. We do something like 7 (?) complete copies of the data, often 
 for simple things like adding a 4 byte size to the front. I am not sure how 
 this went by unnoticed.
 This is likely the root cause of the performance issues we saw in doing bulk 
 recompression of data in mirror maker.
 The mismatch between the InputStream and OutputStream interfaces and the 
 Message/MessageSet interfaces which are based on byte buffers is the cause of 
 many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies


[ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347441#comment-14347441
 ] 

Yasuhiro Matsuda commented on KAFKA-527:


Created reviewboard https://reviews.apache.org/r/31742/diff/
 against branch origin/trunk

 Compression support does numerous byte copies
 -

 Key: KAFKA-527
 URL: https://issues.apache.org/jira/browse/KAFKA-527
 Project: Kafka
  Issue Type: Bug
  Components: compression
Reporter: Jay Kreps
Priority: Critical
 Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
 java.hprof.no-compression.txt, java.hprof.snappy.text


 The data path for compressing or decompressing messages is extremely 
 inefficient. We do something like 7 (?) complete copies of the data, often 
 for simple things like adding a 4 byte size to the front. I am not sure how 
 this went by unnoticed.
 This is likely the root cause of the performance issues we saw in doing bulk 
 recompression of data in mirror maker.
 The mismatch between the InputStream and OutputStream interfaces and the 
 Message/MessageSet interfaces which are based on byte buffers is the cause of 
 many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies


 [ 
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhiro Matsuda updated KAFKA-527:
---
Assignee: Yasuhiro Matsuda
  Status: Patch Available  (was: Open)

 Compression support does numerous byte copies
 -

 Key: KAFKA-527
 URL: https://issues.apache.org/jira/browse/KAFKA-527
 Project: Kafka
  Issue Type: Bug
  Components: compression
Reporter: Jay Kreps
Assignee: Yasuhiro Matsuda
Priority: Critical
 Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, 
 java.hprof.no-compression.txt, java.hprof.snappy.text


 The data path for compressing or decompressing messages is extremely 
 inefficient. We do something like 7 (?) complete copies of the data, often 
 for simple things like adding a 4 byte size to the front. I am not sure how 
 this went by unnoticed.
 This is likely the root cause of the performance issues we saw in doing bulk 
 recompression of data in mirror maker.
 The mismatch between the InputStream and OutputStream interfaces and the 
 Message/MessageSet interfaces which are based on byte buffers is the cause of 
 many of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1863) Exception categories / hierarchy in clients


[ 
https://issues.apache.org/jira/browse/KAFKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347342#comment-14347342
 ] 

Guozhang Wang commented on KAFKA-1863:
--

Add some more docs in the possible exception hierarchy, with this and Jay's 
modified throws exception in send() I think this ticket can be covered.

 Exception categories / hierarchy in clients
 ---

 Key: KAFKA-1863
 URL: https://issues.apache.org/jira/browse/KAFKA-1863
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1863.patch


 In the new clients package we introduces a new set of exceptions, but its 
 hierarchy is not very clear as of today:
 {code}
 RuntimeException - KafkaException - BufferExhastedException
- ConfigException
- 
 SerializationException
- 
 QuotaViolationException
- SchemaException
- ApiException
 ApiException - InvalidTopicException
  - OffsetMetadataTooLarge (probabaly need to be renamed)
  - RecordBatchTooLargeException
  - RecordTooLargeException
  - UnknownServerException
  - RetriableException
 RetriableException - CorruptRecordException
- InvalidMetadataException
- NotEnoughtReplicasAfterAppendException
- NotEnoughReplicasException
- OffsetOutOfRangeException
- TimeoutException
- UnknownTopicOrPartitionException
 {code}
 KafkaProducer.send() may throw KafkaExceptions that are not ApiExceptions; 
 other exceptions will be set in the returned future metadata.
 We need better to
 1. Re-examine the hierarchy. For example, for producers only exceptions that 
 are thrown directly from the caller thread before it is appended to the batch 
 buffer should be ApiExceptions; some exceptions could be renamed / merged.
 2. Clearly document the exception category / hierarchy as part of the release.
 [~criccomini] may have some more feedbacks for this issue from Samza's usage 
 experience. [~jkreps]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-2 - Refactor brokers to allow listening on multiple ports and IPs

2015-03-04 Thread Jun Rao

+1

Thanks,

Jun

On Tue, Mar 3, 2015 at 12:14 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Details are in the wiki:


 https://cwiki.apache.org/confluence/display/KAFKA/KIP-2+-+Refactor+brokers+to+allow+listening+on+multiple+ports+and+IPs

[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies

[
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347478#comment-14347478
]

Yasuhiro Matsuda commented on KAFKA-527:

This patch introduces BufferingOutputStream, an alternative for
ByteArrayOutputStream. It is backed by a chain of byte arrays, so it does not
copy bytes when increasing its capacity. Also, it has a method that writes the
content to ByteBuffer directly, so there is no need to create an array instance
to transfer the content to ByteBuffer. Lastly, it has a deferred write, which
means that you reserve a number of bytes before knowing the value and fill it
later. In MessageWriter (a new class), it is used for writing the CRC value and
the payload length.

On laptop,I tested the performance using TestLinearWriteSpeed with snappy.

Previously
26.64786026813998 MB per sec

With the patch
35.78401869390889 MB per sec

The improvement is about 34% better throughput.

Compression support does numerous byte copies
-

Key: KAFKA-527
URL: https://issues.apache.org/jira/browse/KAFKA-527
Project: Kafka
Issue Type: Bug
Components: compression
Reporter: Jay Kreps
Assignee: Yasuhiro Matsuda
Priority: Critical
Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch,
java.hprof.no-compression.txt, java.hprof.snappy.text

The data path for compressing or decompressing messages is extremely
inefficient. We do something like 7 (?) complete copies of the data, often
for simple things like adding a 4 byte size to the front. I am not sure how
this went by unnoticed.
This is likely the root cause of the performance issues we saw in doing bulk
recompression of data in mirror maker.
The mismatch between the InputStream and OutputStream interfaces and the
Message/MessageSet interfaces which are based on byte buffers is the cause of
many of these.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1863) Exception categories / hierarchy in clients


 [ 
https://issues.apache.org/jira/browse/KAFKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1863:
-
Attachment: KAFKA-1863.patch

 Exception categories / hierarchy in clients
 ---

 Key: KAFKA-1863
 URL: https://issues.apache.org/jira/browse/KAFKA-1863
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1863.patch


 In the new clients package we introduces a new set of exceptions, but its 
 hierarchy is not very clear as of today:
 {code}
 RuntimeException - KafkaException - BufferExhastedException
- ConfigException
- 
 SerializationException
- 
 QuotaViolationException
- SchemaException
- ApiException
 ApiException - InvalidTopicException
  - OffsetMetadataTooLarge (probabaly need to be renamed)
  - RecordBatchTooLargeException
  - RecordTooLargeException
  - UnknownServerException
  - RetriableException
 RetriableException - CorruptRecordException
- InvalidMetadataException
- NotEnoughtReplicasAfterAppendException
- NotEnoughReplicasException
- OffsetOutOfRangeException
- TimeoutException
- UnknownTopicOrPartitionException
 {code}
 KafkaProducer.send() may throw KafkaExceptions that are not ApiExceptions; 
 other exceptions will be set in the returned future metadata.
 We need better to
 1. Re-examine the hierarchy. For example, for producers only exceptions that 
 are thrown directly from the caller thread before it is appended to the batch 
 buffer should be ApiExceptions; some exceptions could be renamed / merged.
 2. Clearly document the exception category / hierarchy as part of the release.
 [~criccomini] may have some more feedbacks for this issue from Samza's usage 
 experience. [~jkreps]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1863) Exception categories / hierarchy in clients


[ 
https://issues.apache.org/jira/browse/KAFKA-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347338#comment-14347338
 ] 

Guozhang Wang commented on KAFKA-1863:
--

Created reviewboard https://reviews.apache.org/r/31735/diff/
 against branch origin/trunk

 Exception categories / hierarchy in clients
 ---

 Key: KAFKA-1863
 URL: https://issues.apache.org/jira/browse/KAFKA-1863
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1863.patch


 In the new clients package we introduces a new set of exceptions, but its 
 hierarchy is not very clear as of today:
 {code}
 RuntimeException - KafkaException - BufferExhastedException
- ConfigException
- 
 SerializationException
- 
 QuotaViolationException
- SchemaException
- ApiException
 ApiException - InvalidTopicException
  - OffsetMetadataTooLarge (probabaly need to be renamed)
  - RecordBatchTooLargeException
  - RecordTooLargeException
  - UnknownServerException
  - RetriableException
 RetriableException - CorruptRecordException
- InvalidMetadataException
- NotEnoughtReplicasAfterAppendException
- NotEnoughReplicasException
- OffsetOutOfRangeException
- TimeoutException
- UnknownTopicOrPartitionException
 {code}
 KafkaProducer.send() may throw KafkaExceptions that are not ApiExceptions; 
 other exceptions will be set in the returned future metadata.
 We need better to
 1. Re-examine the hierarchy. For example, for producers only exceptions that 
 are thrown directly from the caller thread before it is appended to the batch 
 buffer should be ApiExceptions; some exceptions could be renamed / merged.
 2. Clearly document the exception category / hierarchy as part of the release.
 [~criccomini] may have some more feedbacks for this issue from Samza's usage 
 experience. [~jkreps]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 31742: Patch for KAFKA-527

2015-03-04 Thread Yasuhiro Matsuda


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31742/
---

Review request for kafka.


Bugs: KAFKA-527
https://issues.apache.org/jira/browse/KAFKA-527


Repository: kafka


Description
---

less byte copies


Diffs
-

  core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
9c694719dc9b515fb3c3ae96435a87b334044272 
  core/src/main/scala/kafka/message/MessageWriter.scala PRE-CREATION 
  core/src/test/scala/unit/kafka/message/MessageWriterTest.scala PRE-CREATION 

Diff: https://reviews.apache.org/r/31742/diff/


Testing
---


Thanks,

Yasuhiro Matsuda

[jira] [Commented] (KAFKA-1998) Partitions Missing From MetadataResponse

2015-03-04 Thread Mayuresh Gharat (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347645#comment-14347645
]

Mayuresh Gharat commented on KAFKA-1998:

I suppose what we need to do is top expose the availablePartitionsForTopic()
api in producer. I discussed this with [~guozhang] and he said that its going
to happen as a part of other patch that Jun is working on.

[~guozhang] can you comment on this. We can decide whether we can close this
ticket if required.

Partitions Missing From MetadataResponse

Key: KAFKA-1998
URL: https://issues.apache.org/jira/browse/KAFKA-1998
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 0.8.2.0
Reporter: Evan Huus
Assignee: Mayuresh Gharat

It is known behaviour that when a partition is entirely offline (it has no
leader because all of its replicas are down) then that partition will not be
included in the metadata returned by other brokers. For example, if topic
foo has 3 partitions, but all replicas of partition 3 are offline, then
requesting metadata for foo will only return information about partitions 1
and 2.
This means that there is no way to reliably determine the number of
partitions for a topic via kafka's metadata API; if I receive information on
partitions 1 and 2, I don't know if partition 3 is offline or if it is simply
that there are only two partitions total. (You can presumably still ask
zookeeper directly, but that is a work-around).
This ambiguity, in turn, can lead to a consistency problem with the default
partitioner, since that effectively implements `hash(key) mod #partitions`.
If a partition goes offline and is removed from the metadata response, then
the number of partitions the producer knows about will change (on its next
metadata refresh) and the mapping from keys to partitions will also change.
Instead of distributing messages among (for example) 3 partitions, and
failing to produce to the offline partition, it will distribute *all*
messages among the two online partitions. This results in messages being sent
to the wrong partition.
Since kafka already returns partitions with error messages in many cases
(e.g. `LeaderNotAvailable`) I think it makes much more sense and fixes the
above partition problem if it would simply return offline partitions as well
with the appropriate error (whether that is `LeaderNotAvailable` or it would
be better to add an additional error is up to you).
CC [~guozhang]
(This issue was originally described/discussed on the kafka-users mailing
list, in the thread involving
https://mail-archives.apache.org/mod_mbox/kafka-users/201503.mbox/%3CCAA4pprAZvp2XhdNmy0%2BqVZ1UVdVxmUfz3DDArhGbwP-iiH%2BGyg%40mail.gmail.com%3E)
If there are any questions I am happy to clarify, I realize the scenario is
somewhat complex.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-2003) Add upgrade tests

2015-03-04 Thread Gwen Shapira (JIRA)

Gwen Shapira created KAFKA-2003:
---

 Summary: Add upgrade tests
 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh


To test protocol changes, compatibility and upgrade process, we need a good way 
to test different versions of the product together and to test end-to-end 
upgrade process.

For example, for 0.8.2 to 0.8.3 test we want to check:
* Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
* Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
* Can 0.8.2 clients run against a cluster of 0.8.3 brokers?

There are probably more questions. But an automated framework that can test 
those and report results will be a good start.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [KIP-DISCUSSION] KIP-7 Security - IP Filtering

2015-03-04 Thread Joel Koshy

Thanks for confirming. My sample-set is obviously limited to the
companies where I have worked - where operations folks do have root
access on the services that they manage.

That said, it seems fairly straightforward to add this and would help
users who don't have the benefit of su privileges.

Joel

On Wed, Mar 04, 2015 at 12:01:59AM -0500, Jeff Holoman wrote:
 Hey Joel, good questions
 
 As a first thought, my experience with customers in large corporate
 environments probably has me somewhat jaded :). You know it really
 shouldn't take 3 weeks to get ports opened on a load balancer, but, that
 really does happen. Coordination across those teams also can and should /
 does happen but I've noted that operators appreciate measures they can take
 that keep them out of more internal process.
 
 1) Yes probably. After all we're really just checking what's returned from
 InetAddress and trusting that. The check is pretty lightweight. I think
 what you are getting at is that a security check that doesn't go all the
 way can be bad as it can engender a false sense sense of security and end
 up leaving the system more vulnerable to attack than if other, more
 standard, approaches are taken. This is a fair point. I'm not deep enough
 into network security to comment all that intelligently but I do think that
 reducing the exposure to say, IP spoofing on internal traffic vs
 free-for-all data consumption is a step in the right direction.
 
 2) Yes they may have access to this, and it could be redundant. On
 customers that I interface with, operators typically get their root-level
 privileges through something like PowerBroker, so access to IPTables is not
 a given, and even if it's available typically does not fall within their
 realm of accepted responsibilities. Additionally, when I first got this
 request I suggested IPTables and was told that due to the difficulties and
 complexities of configuration and management (from their perspective) that
 it would not be an acceptable solution (also the it's not in the corporate
 standard line)
 
 I noted in the KIP that I look at this not only as a potential security
 measure by reducing attack vector size but also as a guard against human
 error. Hardcoded configs sometimes make their way all the way to production
 and this would help to limit that.
 
 You could argue that it might not be Kafka's responsibility to enforce this
 type of control, but there is precedence here with HDFS (dfs.hosts and
 dfs.hosts.exclude) and Flume 
 (*https://issues.apache.org/jira/browse/FLUME-2189
 https://issues.apache.org/jira/browse/FLUME-2189*).
 
 In short, I don't think that this supplants more robust security
 functionality but I do think it gives an additional (lightweight) control
 which would be useful. Security is about defense in depth, and this raises
 the bar a tad.
 
 Thanks
 
 Jeff
 
 On Tue, Mar 3, 2015 at 8:58 PM, Joel Koshy jjkosh...@gmail.com wrote:
 
  The proposal itself looks reasonable, but I have a couple of
  questions as you made reference to operators of the system; and
  network team in your wiki.
 
  - Are spoofing attacks a concern even with this in place? If so, it
would require some sort of internal ingress filtering which
presumably need cooperation with network teams right?
  - Also, the operators of the (Kafka) system really should have access
to iptables on the Kafka brokers so wouldn't this feature be
effectively redundant?
 
  Thanks,
 
  Joel
 
  On Thu, Jan 22, 2015 at 01:50:41PM -0500, Joe Stein wrote:
   Hey Jeff, thanks for the patch and writing this up.
  
   I think the approach to explicitly deny and then set what is allowed or
   explicitly allow then deny specifics makes sense. Supporting CIDR
  notation
   and ip4 and ip6 both good too.
  
   Waiting for KAFKA-1845 to get committed I think makes sense before
   reworking this anymore right now, yes. Andrii posted a patch yesterday
  for
   it so hopefully in the next ~ week(s).
  
   Not sure what other folks think of this approach but whatever that is
  would
   be good to have it in prior to reworking for the config def changes.
  
   /***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
   /
  
   On Wed, Jan 21, 2015 at 8:47 PM, Jeff Holoman jholo...@cloudera.com
  wrote:
  
Posted a KIP for IP Filtering:
   
   
   
  https://cwiki.apache.org/confluence/display/KAFKA/KIP-7+-+Security+-+IP+Filtering
   
Relevant JIRA:
https://issues.apache.org/jira/browse/KAFKA-1810
   
Appreciate any feedback.
   
Thanks
   
Jeff
   
 
 
 
 
 -- 
 Jeff Holoman
 Systems Engineer

-- 
Joel

[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker


 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-04_15:42:45.patch

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[KIP-DISCUSSION] KIP-13 Quotas

2015-03-04 Thread Aditya Auradkar

Posted a KIP for quotas in kafka.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas

Appreciate any feedback.

Aditya

[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions

2015-03-04 Thread Geoffrey Anderson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347795#comment-14347795
 ] 

Geoffrey Anderson commented on KAFKA-1313:
--

Quoting the Kafka docs:
The first step is to hand craft the custom reassignment plan in a json file-

 cat increase-replication-factor.json
{version:1,
 partitions:[{topic:foo,partition:0,replicas:[5,6,7]}]}

Then, use the json file with the --execute option to start the reassignment 
process-
 bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 
 --reassignment-json-file increase-replication-factor.json --execute
---

As I understand it, the pain point is hand crafting the reassignment plan. 
From there on out, updating replication factor is functionally identical to any 
other reassignment. So my proposal is to add another file type 
(--replication-factor-json-file) to kafka-reassign-partitions.sh --generate 
(and/or --rebalance per KIP 6) which allows users to specify desired 
replication_factor per topic/partition.

For example:
cat update-replication-factor.json
{version:1,
 partitions:[{topic:foo,partition:0,replication_factor:3}]}

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 
--replication-factor-json-file update-replication-factor.json --generate

The user would then feed generated output into kafka-reassign-partitions.sh 
--execute as before.

[~nehanarkhede], let me know if this seems reasonable/is in the ballpark of 
what you had in mind.

 Support adding replicas to existing topic partitions
 

 Key: KAFKA-1313
 URL: https://issues.apache.org/jira/browse/KAFKA-1313
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
Reporter: Marc Labbe
Assignee: Geoffrey Anderson
Priority: Critical
  Labels: newbie++
 Fix For: 0.9.0


 There is currently no easy way to add replicas to an existing topic 
 partitions.
 For example, topic create-test has been created with ReplicationFactor=1: 
 Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
 Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
 Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
 I would like to increase the ReplicationFactor=2 (or more) so it shows up 
 like this instead.
 Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
 Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
 Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
 Use cases for this:
 - adding brokers and thus increase fault tolerance
 - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker


[ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347743#comment-14347743
 ] 

Jiangjie Qin commented on KAFKA-1997:
-

Updated reviewboard https://reviews.apache.org/r/31706/diff/
 against branch origin/trunk

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker


 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-04_15:07:46.patch

 Refactor Mirror Maker
 -

 Key: KAFKA-1997
 URL: https://issues.apache.org/jira/browse/KAFKA-1997
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
 KAFKA-1997_2015-03-04_15:07:46.patch


 Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2003) Add upgrade tests

2015-03-04 Thread Ashish K Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347804#comment-14347804
 ] 

Ashish K Singh commented on KAFKA-2003:
---

[~gwenshap] does the following outline make sense?

$ some_tool from_version m_instances to_version n_instances
1. Compile jars into from_version_target
2. Compile jars into to_version_target
3. Bring up n to_version brokers
4. Run tests producer consumer tests with clients from from_version_target
5. Bring up m from_version brokers
6. Run tests producer consumer tests with clients from from_version_target
7. Run tests producer consumer tests with clients from to_version_target
8. 
{noformat}
foreach(broker: brokers)
Bump up the interbroker.protocol.version
Run tests producer consumer tests with clients from 
from_version_target
Run tests producer consumer tests with clients from to_version_target
{noformat}

 Add upgrade tests
 -

 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 To test protocol changes, compatibility and upgrade process, we need a good 
 way to test different versions of the product together and to test end-to-end 
 upgrade process.
 For example, for 0.8.2 to 0.8.3 test we want to check:
 * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
 * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
 * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
 There are probably more questions. But an automated framework that can test 
 those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-2004) Write Kafka messages directly to HDFS

2015-03-04 Thread sutanu das (JIRA)

sutanu das created KAFKA-2004:
-

 Summary: Write Kafka messages directly to HDFS
 Key: KAFKA-2004
 URL: https://issues.apache.org/jira/browse/KAFKA-2004
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core, producer 
Affects Versions: 0.8.1.1
Reporter: sutanu das
Assignee: Neha Narkhede
Priority: Critical


1. Is there a way to write Kafka messages directly to HDFS without writing any 
consumer code? 

2. Is there anyway to integrate Kafka with Storm or Spark so messages goes 
directly from Kafka consumers to HDFS sync?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

2015-03-04 Thread Todd Palino

Apologize for the late comment on this...

So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

-Todd

On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein joe.st...@stealth.ly wrote:

I will go back through the ticket and code and write more up. Should be
able to-do that sometime next week. The intention was to not replace
existing functionality by issue a WARN on use. The following version it is
released we could then deprecate it... I will fix the KIP for that too.

On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede n...@confluent.io wrote:

Hey Joe,

1. Could you add details to the Public Interface section of the KIP? This
should include the proposed changes to the partition reassignment tool.
Also, maybe the new option can be named --rebalance instead of
--re-balance?
2. It makes sense to list --decommission-broker as part of this KIP.
Similarly, shouldn't we also have an --add-broker option? The way I see
this is that there are several events when a partition reassignment is
required. Before this functionality is automated on the broker, the tool
will generate an ideal replica placement for each such event. The users
should merely have to specify the nature of the event e.g. adding a
broker
or decommissioning an existing broker or merely rebalancing.
3. If I understand the KIP correctly, the upgrade plan for this feature
includes removing the existing --generate option on the partition
reassignment tool in 0.8.3 while adding all the new options in the same
release. Is that correct?

Thanks,
Neha

On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps jay.kr...@gmail.com wrote:

Ditto on this one. Can you give the algorithm we want to implement?

Also I think in terms of scope this is just proposing to change the
logic
in ReassignPartitionsCommand? I think we've had the discussion various
times on the mailing list that what people really want is just for
Kafka
to
do it's best to balance data in an online fashion (for some definition
of
balance). i.e. if you add a new node partitions would slowly migrate to
it,
and if a node dies, partitions slowly migrate off it. This could
potentially be more work, but I'm not sure how much more. Has anyone
thought about how to do it?

-Jay

On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein joe.st...@stealth.ly
wrote:

Posted a KIP for --re-balance for partition assignment in
reassignment
tool.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing

JIRA https://issues.apache.org/jira/browse/KAFKA-1792

While going through the KIP I thought of one thing from the JIRA that
we
should change. We should preserve --generate to be existing
functionality
for the next release it is in. If folks want to use --re-balance then
great, it just won't break any upgrade paths, yet.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

--
Thanks,
Neha

Error when submit a patch set which contains a file rename.

2015-03-04 Thread Tong Li

Hi, I was trying to submit a patch set for review. In this patch set, I
removed one file and created another to replace the removed. However, when
I run kafka-patch-review.py I got an error.

python kafka-patch-review.py -b origin/trunk -j KAFKA-1926
Configuring reviewboard url to https://reviews.apache.org
Updating your remote branches to pull the latest changes
Verifying JIRA connection configurations
ERROR: Error validating diff

core/src/main/scala/kafka/utils/Utils.scala: The file was not found in the
repository. (HTTP 400, API Error 207)
ERROR: reviewboard update failed. Exiting.

I wonder if it is because I do not have enough of permission? the file to
be renamed is clearly there. Any help will be appreciated.

Thanks.

Tong Li
OpenStack  Kafka Community Development
Building 501/B205
liton...@us.ibm.com

[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils

2015-03-04 Thread Tong Li (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348038#comment-14348038
 ] 

Tong Li commented on KAFKA-1926:


Jay, I finally put together a new patch set just for this issue. (the 
SystemTime issue we can address in a separate issue since this patch set is 
already very big). In the new patch set, I will have the following changes made:

This patch set is an attempt to refactor core utils class and remove
the namespace overlap. Utils class was defined by both clients and core
projects. Many methods are duplicate. This patch did the following:

1. Renames the core kafka.utils.Utils class to be kafka.utils.CoreUtils
2. Removed the abs method in kafka.utils.CoreUtils, anywhere using abs
   method will be replaced with the method in o.a.k.c.u.Utils.abs
3. Removed the Crc32 class in kafka.utils package. Also use the one
   defined in o.a.k.c.u.Utils class.
4. Removed asString method in CoreUtils since this was not used.
5. Removed ReadUnisnedInt (two methods), WriteUnsignedInt (two methods) 
since
   they are completely duplicate of the client methods, both signature and
   implementation are absolutely identical.

I will submit the patch set for you to review. Since this patch set touches so 
many
files, very time consuming. if we can get this in faster, then I won't need to 
do so
many rounds of rebase. Thanks so much.




 Replace kafka.utils.Utils with o.a.k.common.utils.Utils
 ---

 Key: KAFKA-1926
 URL: https://issues.apache.org/jira/browse/KAFKA-1926
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2.0
Reporter: Jay Kreps
  Labels: newbie, patch
 Attachments: KAFKA-1926.patch, KAFKA-1926.patch


 There is currently a lot of duplication between the Utils class in common and 
 the one in core.
 Our plan has been to deprecate duplicate code in the server and replace it 
 with the new common code.
 As such we should evaluate each method in the scala Utils and do one of the 
 following:
 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose 
 utility in active use that is not Kafka-specific. If we migrate it we should 
 really think about the API and make sure there is some test coverage. A few 
 things in there are kind of funky and we shouldn't just blindly copy them 
 over.
 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold 
 any utilities that really need to make use of Scala features to be convenient.
 3. Delete it if it is not used, or has a bad api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 31706: Patch for KAFKA-1997

2015-03-04 Thread Jiangjie Qin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 4, 2015, 11:42 p.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Expanded imports from _ and * to full class path


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
5374280dc97dc8e01e9b3ba61fd036dc13ae48cb 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
a17e8532c44aadf84b8da3a57bcc797a848b5020 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin

[jira] [Commented] (KAFKA-1998) Partitions Missing From MetadataResponse

2015-03-04 Thread Jay Kreps (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347740#comment-14347740
 ] 

Jay Kreps commented on KAFKA-1998:
--

This seems like a bug, shouldn't we always return all the partitions and mark 
them as having no leader?

 Partitions Missing From MetadataResponse
 

 Key: KAFKA-1998
 URL: https://issues.apache.org/jira/browse/KAFKA-1998
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.0
Reporter: Evan Huus
Assignee: Mayuresh Gharat

 It is known behaviour that when a partition is entirely offline (it has no 
 leader because all of its replicas are down) then that partition will not be 
 included in the metadata returned by other brokers. For example, if topic 
 foo has 3 partitions, but all replicas of partition 3 are offline, then 
 requesting metadata for foo will only return information about partitions 1 
 and 2.
 This means that there is no way to reliably determine the number of 
 partitions for a topic via kafka's metadata API; if I receive information on 
 partitions 1 and 2, I don't know if partition 3 is offline or if it is simply 
 that there are only two partitions total. (You can presumably still ask 
 zookeeper directly, but that is a work-around).
 This ambiguity, in turn, can lead to a consistency problem with the default 
 partitioner, since that effectively implements `hash(key) mod #partitions`. 
 If a partition goes offline and is removed from the metadata response, then 
 the number of partitions the producer knows about will change (on its next 
 metadata refresh) and the mapping from keys to partitions will also change. 
 Instead of distributing messages among (for example) 3 partitions, and 
 failing to produce to the offline partition, it will distribute *all* 
 messages among the two online partitions. This results in messages being sent 
 to the wrong partition.
 Since kafka already returns partitions with error messages in many cases 
 (e.g. `LeaderNotAvailable`) I think it makes much more sense and fixes the 
 above partition problem if it would simply return offline partitions as well 
 with the appropriate error (whether that is `LeaderNotAvailable` or it would 
 be better to add an additional error is up to you).
 CC [~guozhang]
 (This issue was originally described/discussed on the kafka-users mailing 
 list, in the thread involving 
 https://mail-archives.apache.org/mod_mbox/kafka-users/201503.mbox/%3CCAA4pprAZvp2XhdNmy0%2BqVZ1UVdVxmUfz3DDArhGbwP-iiH%2BGyg%40mail.gmail.com%3E)
 If there are any questions I am happy to clarify, I realize the scenario is 
 somewhat complex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30126: Patch for KAFKA-1845

2015-03-04 Thread Andrii Biletskyi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30126/
---

(Updated March 4, 2015, 11:12 p.m.)


Review request for kafka.


Bugs: KAFKA-1845
https://issues.apache.org/jira/browse/KAFKA-1845


Repository: kafka


Description (updated)
---

KAFKA-1845 - Fixed merge conflicts, ported added configs to KafkaConfig


KAFKA-1845 - KafkaConfig to ConfigDef: moved validateValues so it's called on 
instantiating KafkaConfig


KAFKA-1845 - KafkaConfig to ConfigDef: MaxConnectionsPerIpOverrides refactored


KAFKA-1845 - code review fixes, merge conflicts after rebase


KAFKA-1845 - rebase to trunk


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java 
852a9b39a2f9cd71d176943be86531a43ede 
  core/src/main/scala/kafka/Kafka.scala 
77a49e12af6f869e63230162e9f87a7b0b12b610 
  core/src/main/scala/kafka/controller/KafkaController.scala 
e9b4dc62df3f139de8d4dc688e48ccc0a5513123 
  core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
4a31c7271c2d0a4b9e8b28be729340ecfa0696e5 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
14bf3216bae030331bdf76b3266ed0e73526c3de 
  core/src/main/scala/kafka/server/KafkaServer.scala 
8e3def9e9edaf49c0443a2e08cf37277d0f25306 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
6879e730282185bda3d6bc3659cb15af0672cecf 
  core/src/test/scala/integration/kafka/api/IntegrationTestHarness.scala 
5650b4a7b950b48af3e272947bfb5e271c4238c9 
  core/src/test/scala/integration/kafka/api/ProducerCompressionTest.scala 
e63558889272bc76551accdfd554bdafde2e0dd6 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
d34ee3a40dcc8475e183435ad6842cd3d0a13ade 
  core/src/test/scala/integration/kafka/api/ProducerSendTest.scala 
8154a4210dc8dcceb26549c98bbbc4a1282a1c67 
  core/src/test/scala/unit/kafka/admin/AddPartitionsTest.scala 
1bf2667f47853585bc33ffb3e28256ec5f24ae84 
  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 
  core/src/test/scala/unit/kafka/admin/DeleteConsumerGroupTest.scala 
d530338728be41282925b3a62030d1f316b4f9c5 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
c8f336aa034ab5702c7644a669fd32c746512d29 
  core/src/test/scala/unit/kafka/consumer/ConsumerIteratorTest.scala 
c0355cc0135c6af2e346b4715659353a31723b86 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
a17e8532c44aadf84b8da3a57bcc797a848b5020 
  core/src/test/scala/unit/kafka/integration/AutoOffsetResetTest.scala 
95303e098d40cd790fb370e9b5a47d20860a6da3 
  core/src/test/scala/unit/kafka/integration/FetcherTest.scala 
25845abbcad2e79f56f729e59239b738d3ddbc9d 
  core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala 
aeb7a19acaefabcc161c2ee6144a56d9a8999a81 
  core/src/test/scala/unit/kafka/integration/RollingBounceTest.scala 
eab4b5f619015af42e4554660eafb5208e72ea33 
  core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala 
35dc071b1056e775326981573c9618d8046e601d 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
ba3bcdcd1de9843e75e5395dff2fc31b39a5a9d5 
  
core/src/test/scala/unit/kafka/javaapi/consumer/ZookeeperConsumerConnectorTest.scala
 d6248b09bb0f86ee7d3bd0ebce5b99135491453b 
  core/src/test/scala/unit/kafka/log/LogTest.scala 
1a4be70a21fe799d4d71dd13e84968e40fb8ad92 
  core/src/test/scala/unit/kafka/log4j/KafkaLog4jAppenderTest.scala 
4ea0489c9fd36983fe190491a086b39413f3a9cd 
  core/src/test/scala/unit/kafka/metrics/MetricsTest.scala 
111e4a26c1efb6f7c151ca9217dbe107c27ab617 
  core/src/test/scala/unit/kafka/producer/AsyncProducerTest.scala 
1db6ac329f7b54e600802c8a623f80d159d4e69b 
  core/src/test/scala/unit/kafka/producer/ProducerTest.scala 
ce65dab4910d9182e6774f6ef1a7f45561ec0c23 
  core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala 
d60d8e0f49443f4dc8bc2cad6e2f951eda28f5cb 
  core/src/test/scala/unit/kafka/server/AdvertiseBrokerTest.scala 
f0c4a56b61b4f081cf4bee799c6e9c523ff45e19 
  core/src/test/scala/unit/kafka/server/DynamicConfigChangeTest.scala 
ad121169a5e80ebe1d311b95b219841ed69388e2 
  core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
8913fc1d59f717c6b3ed12c8362080fb5698986b 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
a703d2715048c5602635127451593903f8d20576 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
82dce80d553957d8b5776a9e140c346d4e07f766 
  core/src/test/scala/unit/kafka/server/LeaderElectionTest.scala 
c2ba07c5fdbaf0e65ca033b2e4d88f45a8a15b2e 
  core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 
c06ee756bf0fe07e5d3c92823a476c960b37afd6 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala

[jira] [Updated] (KAFKA-1845) KafkaConfig should use ConfigDef

2015-03-04 Thread Andrii Biletskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrii Biletskyi updated KAFKA-1845:

Attachment: KAFKA-1845_2015-03-05_01:12:22.patch

 KafkaConfig should use ConfigDef 
 -

 Key: KAFKA-1845
 URL: https://issues.apache.org/jira/browse/KAFKA-1845
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira
Assignee: Andrii Biletskyi
  Labels: newbie
 Fix For: 0.8.3

 Attachments: KAFKA-1845.patch, KAFKA-1845_2015-02-08_17:05:22.patch, 
 KAFKA-1845_2015-03-05_01:12:22.patch


 ConfigDef is already used for the new producer and for TopicConfig. 
 Will be nice to standardize and use one configuration and validation library 
 across the board.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1845) KafkaConfig should use ConfigDef

2015-03-04 Thread Andrii Biletskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347751#comment-14347751
 ] 

Andrii Biletskyi commented on KAFKA-1845:
-

Updated reviewboard https://reviews.apache.org/r/30126/diff/
 against branch origin/trunk

 KafkaConfig should use ConfigDef 
 -

 Key: KAFKA-1845
 URL: https://issues.apache.org/jira/browse/KAFKA-1845
 Project: Kafka
  Issue Type: Sub-task
Reporter: Gwen Shapira
Assignee: Andrii Biletskyi
  Labels: newbie
 Fix For: 0.8.3

 Attachments: KAFKA-1845.patch, KAFKA-1845_2015-02-08_17:05:22.patch, 
 KAFKA-1845_2015-03-05_01:12:22.patch


 ConfigDef is already used for the new producer and for TopicConfig. 
 Will be nice to standardize and use one configuration and validation library 
 across the board.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

2015-03-04 Thread Tong Li



Todd,
I think plugable design is good with solid default. The only issue I
feel is when you use one and switch to another, will we end up with some
unread messages hanging around and no one thinks or knows it is their
responsibility to take care of them?

Thanks.

Tong

Sent from my iPhone

 On Mar 5, 2015, at 10:46 AM, Todd Palino tpal...@gmail.com wrote:

 Apologize for the late comment on this...

 So fair assignment by count (taking into account the current partition
 count of each broker) is very good. However, it's worth noting that all
 partitions are not created equal. We have actually been performing more
 rebalance work based on the partition size on disk, as given equal
 retention of all topics, the size on disk is a better indicator of the
 amount of traffic a partition gets, both in terms of storage and network
 traffic. Overall, this seems to be a better balance.

 In addition to this, I think there is very much a need to have Kafka be
 rack-aware. That is, to be able to assure that for a given cluster, you
 never assign all replicas for a given partition in the same rack. This
 would allow us to guard against maintenances or power failures that
affect
 a full rack of systems (or a given switch).

 I think it would make sense to implement the reassignment logic as a
 pluggable component. That way it would be easy to select a scheme when
 performing a reassignment (count, size, rack aware). Configuring a
default
 scheme for a cluster would allow for the brokers to create new topics and
 partitions in compliance with the requested policy.

 -Todd


 On Thu, Jan 22, 2015 at 10:13 PM, Joe Stein joe.st...@stealth.ly wrote:

  I will go back through the ticket and code and write more up. Should be
  able to-do that sometime next week. The intention was to not replace
  existing functionality by issue a WARN on use. The following version it
is
  released we could then deprecate it... I will fix the KIP for that too.
 
  On Fri, Jan 23, 2015 at 12:34 AM, Neha Narkhede n...@confluent.io
wrote:
 
   Hey Joe,
  
   1. Could you add details to the Public Interface section of the KIP?
This
   should include the proposed changes to the partition reassignment
tool.
   Also, maybe the new option can be named --rebalance instead of
   --re-balance?
   2. It makes sense to list --decommission-broker as part of this KIP.
   Similarly, shouldn't we also have an --add-broker option? The way I
see
   this is that there are several events when a partition reassignment
is
   required. Before this functionality is automated on the broker, the
tool
   will generate an ideal replica placement for each such event. The
users
   should merely have to specify the nature of the event e.g. adding a
  broker
   or decommissioning an existing broker or merely rebalancing.
   3. If I understand the KIP correctly, the upgrade plan for this
feature
   includes removing the existing --generate option on the partition
   reassignment tool in 0.8.3 while adding all the new options in the
same
   release. Is that correct?
  
   Thanks,
   Neha
  
   On Thu, Jan 22, 2015 at 9:23 PM, Jay Kreps jay.kr...@gmail.com
wrote:
  
Ditto on this one. Can you give the algorithm we want to implement?
   
Also I think in terms of scope this is just proposing to change the
  logic
in ReassignPartitionsCommand? I think we've had the discussion
various
times on the mailing list that what people really want is just for
  Kafka
   to
do it's best to balance data in an online fashion (for some
definition
  of
balance). i.e. if you add a new node partitions would slowly
migrate to
   it,
and if a node dies, partitions slowly migrate off it. This could
potentially be more work, but I'm not sure how much more. Has
anyone
thought about how to do it?
   
-Jay
   
On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein joe.st...@stealth.ly
   wrote:
   
 Posted a KIP for --re-balance for partition assignment in
  reassignment
 tool.



   
  
  https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New
+reassignment+partition+logic+for+re-balancing

 JIRA https://issues.apache.org/jira/browse/KAFKA-1792

 While going through the KIP I thought of one thing from the JIRA
that
   we
 should change. We should preserve --generate to be existing
   functionality
 for the next release it is in. If folks want to use --re-balance
then
 great, it just won't break any upgrade paths, yet.

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
http://www.twitter.com/allthingshadoop
 /

   
  
  
  
   --
   Thanks,
   Neha

Re: Review Request 31706: Patch for KAFKA-1997

2015-03-04 Thread Jiangjie Qin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 4, 2015, 11:07 p.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
5374280dc97dc8e01e9b3ba61fd036dc13ae48cb 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
a17e8532c44aadf84b8da3a57bcc797a848b5020 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin

[jira] [Commented] (KAFKA-2004) Write Kafka messages directly to HDFS