Re: Review Request 26373: Patch for KAFKA-1647

2014-10-22 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26373/
---

(Updated Oct. 22, 2014, 6:08 a.m.)


Review request for kafka.


Bugs: KAFKA-1647
https://issues.apache.org/jira/browse/KAFKA-1647


Repository: kafka


Description (updated)
---

Addressed Joel's comments.


the version 2 code seems to be submitted by mistake... This should be the code 
for review that addressed Joel's comments.


Addressed Jun's comments. Will do tests to verify if it works.


Diffs (updated)
-

  core/src/main/scala/kafka/server/ReplicaManager.scala 
78b7514cc109547c562e635824684fad581af653 

Diff: https://reviews.apache.org/r/26373/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-22 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179622#comment-14179622
 ] 

Jiangjie Qin commented on KAFKA-1647:
-

Updated reviewboard https://reviews.apache.org/r/26373/diff/
 against branch origin/trunk

 Replication offset checkpoints (high water marks) can be lost on hard kills 
 and restarts
 

 Key: KAFKA-1647
 URL: https://issues.apache.org/jira/browse/KAFKA-1647
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Jiangjie Qin
Priority: Critical
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
 KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch


 We ran into this scenario recently in a production environment. This can 
 happen when enough brokers in a cluster are taken down. i.e., a rolling 
 bounce done properly should not cause this issue. It can occur if all 
 replicas for any partition are taken down.
 Here is a sample scenario:
 * Cluster of three brokers: b0, b1, b2
 * Two partitions (of some topic) with replication factor two: p0, p1
 * Initial state:
 p0: leader = b0, ISR = {b0, b1}
 p1: leader = b1, ISR = {b0, b1}
 * Do a parallel hard-kill of all brokers
 * Bring up b2, so it is the new controller
 * b2 initializes its controller context and populates its leader/ISR cache 
 (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
 known leaders are b0 (for p0) and b1 (for p2)
 * Bring up b1
 * The controller's onBrokerStartup procedure initiates a replica state change 
 for all replicas on b1 to become online. As part of this replica state change 
 it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
 leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
 included in the leaders field because b0 is down.
 * On receiving the LeaderAndIsrRequest, b1's replica manager will 
 successfully make itself (b1) the leader for p1 (and create the local replica 
 object corresponding to p1). It will however abort the become follower 
 transition for p0 because the designated leader b0 is offline. So it will not 
 create the local replica object for p0.
 * It will then start the high water mark checkpoint thread. Since only p1 has 
 a local replica object, only p1's high water mark will be checkpointed to 
 disk. p0's previously written checkpoint  if any will be lost.
 So in summary it seems we should always create the local replica object even 
 if the online transition does not happen.
 Possible symptoms of the above bug could be one or more of the following (we 
 saw 2 and 3):
 # Data loss; yes on a hard-kill data loss is expected, but this can actually 
 cause loss of nearly all data if the broker becomes follower, truncates, and 
 soon after happens to become leader.
 # High IO on brokers that lose their high water mark then subsequently (on a 
 successful become follower transition) truncate their log to zero and start 
 catching up from the beginning.
 # If the offsets topic is affected, then offsets can get reset. This is 
 because during an offset load we don't read past the high water mark. So if a 
 water mark is missing then we don't load anything (even if the offsets are 
 there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-22 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1647:

Attachment: KAFKA-1647_2014-10-21_23:08:43.patch

 Replication offset checkpoints (high water marks) can be lost on hard kills 
 and restarts
 

 Key: KAFKA-1647
 URL: https://issues.apache.org/jira/browse/KAFKA-1647
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Jiangjie Qin
Priority: Critical
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
 KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch


 We ran into this scenario recently in a production environment. This can 
 happen when enough brokers in a cluster are taken down. i.e., a rolling 
 bounce done properly should not cause this issue. It can occur if all 
 replicas for any partition are taken down.
 Here is a sample scenario:
 * Cluster of three brokers: b0, b1, b2
 * Two partitions (of some topic) with replication factor two: p0, p1
 * Initial state:
 p0: leader = b0, ISR = {b0, b1}
 p1: leader = b1, ISR = {b0, b1}
 * Do a parallel hard-kill of all brokers
 * Bring up b2, so it is the new controller
 * b2 initializes its controller context and populates its leader/ISR cache 
 (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
 known leaders are b0 (for p0) and b1 (for p2)
 * Bring up b1
 * The controller's onBrokerStartup procedure initiates a replica state change 
 for all replicas on b1 to become online. As part of this replica state change 
 it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
 leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
 included in the leaders field because b0 is down.
 * On receiving the LeaderAndIsrRequest, b1's replica manager will 
 successfully make itself (b1) the leader for p1 (and create the local replica 
 object corresponding to p1). It will however abort the become follower 
 transition for p0 because the designated leader b0 is offline. So it will not 
 create the local replica object for p0.
 * It will then start the high water mark checkpoint thread. Since only p1 has 
 a local replica object, only p1's high water mark will be checkpointed to 
 disk. p0's previously written checkpoint  if any will be lost.
 So in summary it seems we should always create the local replica object even 
 if the online transition does not happen.
 Possible symptoms of the above bug could be one or more of the following (we 
 saw 2 and 3):
 # Data loss; yes on a hard-kill data loss is expected, but this can actually 
 cause loss of nearly all data if the broker becomes follower, truncates, and 
 soon after happens to become leader.
 # High IO on brokers that lose their high water mark then subsequently (on a 
 successful become follower transition) truncate their log to zero and start 
 catching up from the beginning.
 # If the offsets topic is affected, then offsets can get reset. This is 
 because during an offset load we don't read past the high water mark. So if a 
 water mark is missing then we don't load anything (even if the offsets are 
 there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2014-10-22 Thread xueqiang wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179731#comment-14179731
 ] 

xueqiang wang commented on KAFKA-1646:
--

Hey, Jay, sorry for late. I have done a test and find there is no pause when 
rolling new log segment. The time for creating an 1G file and 1K file is almost 
identical: all about 1ms.
Here is the test code which creating 10 files of 1G:
public static void main(String[] args) throws Exception  {
String filePre = d:\\temp\\file;
long startTime, elapsedTime;
startTime = System.currentTimeMillis();
try {
long initFileSize = 102400l;
for (int i = 0; i  10; i++) {
RandomAccessFile randomAccessFile = new 
RandomAccessFile(filePre + i, rw);
randomAccessFile.setLength(initFileSize);
randomAccessFile.getChannel();
}
elapsedTime = System.currentTimeMillis() - startTime;
System.out.format(elapsedTime: %2d ms, elapsedTime);
} catch (Exception exception) { }
}

The result is: elapsedTime: 14 ms


 Improve consumer read performance for Windows
 -

 Key: KAFKA-1646
 URL: https://issues.apache.org/jira/browse/KAFKA-1646
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1
 Environment: Windows
Reporter: xueqiang wang
  Labels: newbie, patch
 Attachments: Improve consumer read performance for Windows.patch, 
 KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch


 This patch is for Window platform only. In Windows platform, if there are 
 more than one replicas writing to disk, the segment log files will not be 
 consistent in disk and then consumer reading performance will be dropped down 
 greatly. This fix allocates more disk spaces when rolling a new segment, and 
 then it will improve the consumer reading performance in NTFS file system.
 This patch doesn't affect file allocation of other filesystems, for it only 
 adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-22 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein reassigned KAFKA-1678:


Assignee: Dmitry Pekar  (was: Gwen Shapira)

 add new options for reassign partition to better manager dead brokers
 -

 Key: KAFKA-1678
 URL: https://issues.apache.org/jira/browse/KAFKA-1678
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Dmitry Pekar
  Labels: operations
 Fix For: 0.8.3


 this is in two forms
 --replace-replica 
 which is from broker.id to broker.id
 and 
 --remove-replica
 which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-22 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Component/s: tools

 add new options for reassign partition to better manager dead brokers
 -

 Key: KAFKA-1678
 URL: https://issues.apache.org/jira/browse/KAFKA-1678
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Joe Stein
Assignee: Dmitry Pekar
  Labels: operations
 Fix For: 0.8.3


 this is in two forms
 --replace-replica 
 which is from broker.id to broker.id
 and 
 --remove-replica
 which is just a single broker.id



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-22 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Description: 
Four changes here each requiring system test, unit test, code to-do the actual 
work and we should run it on dexter too.

 --replace-broker
 --decommission-broker

fix two bugs
//don't allow topic creation if no topic exists

  was:
this is in two forms

--replace-replica 

which is from broker.id to broker.id

and 

--remove-replica

which is just a single broker.id


 add new options for reassign partition to better manager dead brokers
 -

 Key: KAFKA-1678
 URL: https://issues.apache.org/jira/browse/KAFKA-1678
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Joe Stein
Assignee: Dmitry Pekar
  Labels: operations
 Fix For: 0.8.3


 Four changes here each requiring system test, unit test, code to-do the 
 actual work and we should run it on dexter too.
  --replace-broker
  --decommission-broker
 fix two bugs
 //don't allow topic creation if no topic exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1678) add new options for reassign partition to better manager dead brokers

2014-10-22 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1678:
-
Description: 
Four changes here each requiring a) system tests b) unit test c) code to-do the 
actual work and 4) we should run it on dexter too (we should post the patch 
before running in test lab so others can do the same too at that time).

 --replace-broker
 --decommission-broker

fix two bugs
1) do not allow the user to start reassignment for a topic that doesn't exist
2) do not allow reassign to brokers that don't exist.

There could be other reassign like issue that come up also from others. My 
initial preference is one patch depending on what the issues/changes are and 
where in the code we are too.

  was:
Four changes here each requiring system test, unit test, code to-do the actual 
work and we should run it on dexter too.

 --replace-broker
 --decommission-broker

fix two bugs
//don't allow topic creation if no topic exists


 add new options for reassign partition to better manager dead brokers
 -

 Key: KAFKA-1678
 URL: https://issues.apache.org/jira/browse/KAFKA-1678
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Joe Stein
Assignee: Dmitry Pekar
  Labels: operations
 Fix For: 0.8.3


 Four changes here each requiring a) system tests b) unit test c) code to-do 
 the actual work and 4) we should run it on dexter too (we should post the 
 patch before running in test lab so others can do the same too at that time).
  --replace-broker
  --decommission-broker
 fix two bugs
 1) do not allow the user to start reassignment for a topic that doesn't exist
 2) do not allow reassign to brokers that don't exist.
 There could be other reassign like issue that come up also from others. My 
 initial preference is one patch depending on what the issues/changes are and 
 where in the code we are too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy

2014-10-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180033#comment-14180033
 ] 

Guozhang Wang commented on KAFKA-1718:
--

Since the reason we add the max.message.size restrict on the broker side is 
for consumer's fetch size, if we can change the behavior in the new consumer 
such that when it gets a partial message from the broker it will dynamically 
increase its fetch size then we can remove this config in both the broker and 
the new producer. [~junrao] is there any blockers for doing that?

 Message Size Too Large error when only small messages produced with Snappy
 

 Key: KAFKA-1718
 URL: https://issues.apache.org/jira/browse/KAFKA-1718
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Evan Huus
Priority: Critical

 I'm the primary author of the Go bindings, and while I originally received 
 this as a bug against my bindings, I'm coming to the conclusion that it's a 
 bug in the broker somehow.
 Specifically, take a look at the last two kafka packets in the following 
 packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you 
 will need a trunk build of Wireshark to fully decode the kafka part of the 
 packets).
 The produce request contains two partitions on one topic. Each partition has 
 one message set (sizes 977205 bytes and 967362 bytes respectively). Each 
 message set is a sequential collection of snappy-compressed messages, each 
 message of size 46899. When uncompressed, each message contains a message set 
 of 999600 bytes, containing a sequence of uncompressed 1024-byte messages.
 However, the broker responds to this with a MessageSizeTooLarge error, full 
 stacktrace from the broker logs being:
 kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes 
 which exceeds the maximum configured message size of 112.
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267)
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
   at kafka.log.Log.append(Log.scala:265)
   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366)
   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:185)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:695)
 Since as far as I can tell none of the sizes in the actual produced packet 
 exceed the defined maximum, I can only assume that the broker is 
 miscalculating something somewhere and throwing the exception improperly.
 ---
 This issue can be reliably reproduced using an out-of-the-box binary download 
 of 0.8.1.1 and the following gist: 
 https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use 
 the `producer-ng` branch of the Sarama library).
 ---
 I am happy to provide any more information you might need, or to do relevant 
 experiments etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1583) Kafka API Refactoring

2014-10-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180037#comment-14180037
 ] 

Guozhang Wang commented on KAFKA-1583:
--

Yes, the system tests without the offset management test passes.

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-22 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180056#comment-14180056
 ] 

Jun Rao commented on KAFKA-1481:


Otis,

I expect that we will have about another 4 weeks before 0.8.2 final is 
released. That should give us enough time to iterate on this, right? Since the 
patch touches many files, I'd prefer that we get a clean version upfront.

 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
Priority: Critical
  Labels: patch
 Fix For: 0.8.3

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
 KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, 
 KAFKA-1481_2014-10-15_10-23-35.patch, KAFKA-1481_2014-10-20_23-14-35.patch, 
 KAFKA-1481_2014-10-21_09-14-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-15_10-23-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-20_20-14-35.patch, 
 KAFKA-1481_IDEA_IDE_2014-10-20_23-14-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy

2014-10-22 Thread Evan Huus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180126#comment-14180126
 ] 

Evan Huus edited comment on KAFKA-1718 at 10/22/14 4:22 PM:


??when it gets a partial message from the broker it will dynamically increase 
its fetch size??

like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253 ?


was (Author: eapache):
??when it gets a partial message from the broker it will dynamically increase 
its fetch size??

like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253?

 Message Size Too Large error when only small messages produced with Snappy
 

 Key: KAFKA-1718
 URL: https://issues.apache.org/jira/browse/KAFKA-1718
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Evan Huus
Priority: Critical

 I'm the primary author of the Go bindings, and while I originally received 
 this as a bug against my bindings, I'm coming to the conclusion that it's a 
 bug in the broker somehow.
 Specifically, take a look at the last two kafka packets in the following 
 packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you 
 will need a trunk build of Wireshark to fully decode the kafka part of the 
 packets).
 The produce request contains two partitions on one topic. Each partition has 
 one message set (sizes 977205 bytes and 967362 bytes respectively). Each 
 message set is a sequential collection of snappy-compressed messages, each 
 message of size 46899. When uncompressed, each message contains a message set 
 of 999600 bytes, containing a sequence of uncompressed 1024-byte messages.
 However, the broker responds to this with a MessageSizeTooLarge error, full 
 stacktrace from the broker logs being:
 kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes 
 which exceeds the maximum configured message size of 112.
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267)
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
   at kafka.log.Log.append(Log.scala:265)
   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366)
   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:185)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:695)
 Since as far as I can tell none of the sizes in the actual produced packet 
 exceed the defined maximum, I can only assume that the broker is 
 miscalculating something somewhere and throwing the exception improperly.
 ---
 This issue can be reliably reproduced using an out-of-the-box binary download 
 of 0.8.1.1 and the following gist: 
 https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use 
 the `producer-ng` branch of the Sarama library).
 ---
 I am happy to provide any more information you might need, or to do relevant 
 experiments etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy

2014-10-22 Thread Evan Huus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180126#comment-14180126
 ] 

Evan Huus commented on KAFKA-1718:
--

??when it gets a partial message from the broker it will dynamically increase 
its fetch size??

like https://github.com/Shopify/sarama/blob/master/consumer.go#L236-L253?

 Message Size Too Large error when only small messages produced with Snappy
 

 Key: KAFKA-1718
 URL: https://issues.apache.org/jira/browse/KAFKA-1718
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Evan Huus
Priority: Critical

 I'm the primary author of the Go bindings, and while I originally received 
 this as a bug against my bindings, I'm coming to the conclusion that it's a 
 bug in the broker somehow.
 Specifically, take a look at the last two kafka packets in the following 
 packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you 
 will need a trunk build of Wireshark to fully decode the kafka part of the 
 packets).
 The produce request contains two partitions on one topic. Each partition has 
 one message set (sizes 977205 bytes and 967362 bytes respectively). Each 
 message set is a sequential collection of snappy-compressed messages, each 
 message of size 46899. When uncompressed, each message contains a message set 
 of 999600 bytes, containing a sequence of uncompressed 1024-byte messages.
 However, the broker responds to this with a MessageSizeTooLarge error, full 
 stacktrace from the broker logs being:
 kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes 
 which exceeds the maximum configured message size of 112.
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267)
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
   at kafka.log.Log.append(Log.scala:265)
   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366)
   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:185)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:695)
 Since as far as I can tell none of the sizes in the actual produced packet 
 exceed the defined maximum, I can only assume that the broker is 
 miscalculating something somewhere and throwing the exception improperly.
 ---
 This issue can be reliably reproduced using an out-of-the-box binary download 
 of 0.8.1.1 and the following gist: 
 https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use 
 the `producer-ng` branch of the Sarama library).
 ---
 I am happy to provide any more information you might need, or to do relevant 
 experiments etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1686) Implement SASL/Kerberos

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-1686:
-

Assignee: Sriharsha Chintalapani

 Implement SASL/Kerberos
 ---

 Key: KAFKA-1686
 URL: https://issues.apache.org/jira/browse/KAFKA-1686
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani
 Fix For: 0.9.0


 Implement SASL/Kerberos authentication.
 To do this we will need to introduce a new SASLRequest and SASLResponse pair 
 to the client protocol. This request and response will each have only a 
 single byte[] field and will be used to handle the SASL challenge/response 
 cycle. Doing this will initialize the SaslServer instance and associate it 
 with the session in a manner similar to KAFKA-1684.
 When using integrity or encryption mechanisms with SASL we will need to wrap 
 and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
 SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-1696:
-

Assignee: Sriharsha Chintalapani

 Kafka should be able to generate Hadoop delegation tokens
 -

 Key: KAFKA-1696
 URL: https://issues.apache.org/jira/browse/KAFKA-1696
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180140#comment-14180140
 ] 

Sriharsha Chintalapani commented on KAFKA-1696:
---

[~gwenshap] Incase if you not started working on it I am interested in taking 
it up. Thanks.

 Kafka should be able to generate Hadoop delegation tokens
 -

 Key: KAFKA-1696
 URL: https://issues.apache.org/jira/browse/KAFKA-1696
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-22 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180148#comment-14180148
 ] 

Neha Narkhede commented on KAFKA-1647:
--

[~noslowerdna] This is pretty corner case and we are trying to get it in 0.8.2 
if it can go through some testing. 
[~becket_qin] I think my comment may have been lost in the reviewboard, so 
reposting it here - In order to accept this patch, I'd like us to repeat the 
kind of testing that was done to find this bug. Did you get a chance to do that 
on your latest patch?

 Replication offset checkpoints (high water marks) can be lost on hard kills 
 and restarts
 

 Key: KAFKA-1647
 URL: https://issues.apache.org/jira/browse/KAFKA-1647
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Jiangjie Qin
Priority: Critical
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
 KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch


 We ran into this scenario recently in a production environment. This can 
 happen when enough brokers in a cluster are taken down. i.e., a rolling 
 bounce done properly should not cause this issue. It can occur if all 
 replicas for any partition are taken down.
 Here is a sample scenario:
 * Cluster of three brokers: b0, b1, b2
 * Two partitions (of some topic) with replication factor two: p0, p1
 * Initial state:
 p0: leader = b0, ISR = {b0, b1}
 p1: leader = b1, ISR = {b0, b1}
 * Do a parallel hard-kill of all brokers
 * Bring up b2, so it is the new controller
 * b2 initializes its controller context and populates its leader/ISR cache 
 (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
 known leaders are b0 (for p0) and b1 (for p2)
 * Bring up b1
 * The controller's onBrokerStartup procedure initiates a replica state change 
 for all replicas on b1 to become online. As part of this replica state change 
 it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
 leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
 included in the leaders field because b0 is down.
 * On receiving the LeaderAndIsrRequest, b1's replica manager will 
 successfully make itself (b1) the leader for p1 (and create the local replica 
 object corresponding to p1). It will however abort the become follower 
 transition for p0 because the designated leader b0 is offline. So it will not 
 create the local replica object for p0.
 * It will then start the high water mark checkpoint thread. Since only p1 has 
 a local replica object, only p1's high water mark will be checkpointed to 
 disk. p0's previously written checkpoint  if any will be lost.
 So in summary it seems we should always create the local replica object even 
 if the online transition does not happen.
 Possible symptoms of the above bug could be one or more of the following (we 
 saw 2 and 3):
 # Data loss; yes on a hard-kill data loss is expected, but this can actually 
 cause loss of nearly all data if the broker becomes follower, truncates, and 
 soon after happens to become leader.
 # High IO on brokers that lose their high water mark then subsequently (on a 
 successful become follower transition) truncate their log to zero and start 
 catching up from the beginning.
 # If the offsets topic is affected, then offsets can get reset. This is 
 because during an offset load we don't read past the high water mark. So if a 
 water mark is missing then we don't load anything (even if the offsets are 
 there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Documentation with patches

2014-10-22 Thread Joe Stein
This comes up a lot but in reality not enough.  We don't have a great way
for folks to modify the code and change (or add) to the documentation. I
think the documentation is awesome and as we grow the code contributors
that should continue with them too.

One thought I had that would work is that we copy the SVN files into a
/docs folder in git.  We can then take patches in git and then apply them
to SVN when appropriate (like during a release or for immediate fixes).
This way code changes in that patch can have documentation changes.  The
committers can manage what is changed where as appropriate either prior to
a release or live updates to the website. Yes/No?

Thanks!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens

2014-10-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180160#comment-14180160
 ] 

Gwen Shapira commented on KAFKA-1696:
-

I started on the design doc, but I'll admit that I was not in a hurry since 
this is blocked on the Kerberos support anyway.

If you have your own thoughts and want to collaborate on the design, I'll be 
happy to work together. We can figure out who is doing the coding when it 
becomes more relevant.

 Kafka should be able to generate Hadoop delegation tokens
 -

 Key: KAFKA-1696
 URL: https://issues.apache.org/jira/browse/KAFKA-1696
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26666: Patch for KAFKA-1653

2014-10-22 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2/#review57817
---

Ship it!


Ship It!

- Neha Narkhede


On Oct. 21, 2014, 6:58 p.m., Ewen Cheslack-Postava wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/2/
 ---
 
 (Updated Oct. 21, 2014, 6:58 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1653
 https://issues.apache.org/jira/browse/KAFKA-1653
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Generate error for duplicates in PreferredLeaderElectionCommand instead of 
 just swallowing duplicates.
 
 
 Report which entries are duplicated for ReassignPartitionCommand since they 
 may be difficult to find in large reassignments.
 
 
 Report duplicate topics and duplicate topic partitions in 
 ReassignPartitionsCommand. Make all duplication error messagse include 
 details about what item was duplicated.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/PreferredReplicaLeaderElectionCommand.scala 
 c7918483c02040a7cc18d6e9edbd20a3025a3a55 
   core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala 
 691d69a49a240f38883d2025afaec26fd61281b5 
   core/src/main/scala/kafka/admin/TopicCommand.scala 
 7672c5aab4fba8c23b1bb5cd4785c332d300a3fa 
   core/src/main/scala/kafka/tools/StateChangeLogMerger.scala 
 d298e7e81acc7427c6cf4796b445966267ca54eb 
   core/src/main/scala/kafka/utils/Utils.scala 
 29d5a17d4a03cfd3f3cdd2994cbd783a4be2732e 
   core/src/main/scala/kafka/utils/ZkUtils.scala 
 a7b1fdcb50d5cf930352d37e39cb4fc9a080cb12 
 
 Diff: https://reviews.apache.org/r/2/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ewen Cheslack-Postava
 




[jira] [Commented] (KAFKA-1696) Kafka should be able to generate Hadoop delegation tokens

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180165#comment-14180165
 ] 

Sriharsha Chintalapani commented on KAFKA-1696:
---

[~gwenshap] sounds good to me. Since it was unassigned I assigned it to myself. 
Feel free to reassign. I'll work on putting together my thoughts on this.

 Kafka should be able to generate Hadoop delegation tokens
 -

 Key: KAFKA-1696
 URL: https://issues.apache.org/jira/browse/KAFKA-1696
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1653) Duplicate broker ids allowed in replica assignment

2014-10-22 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1653:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patches. Pushed to trunk and 0.8.2

 Duplicate broker ids allowed in replica assignment
 --

 Key: KAFKA-1653
 URL: https://issues.apache.org/jira/browse/KAFKA-1653
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, 
 KAFKA-1653_2014-10-21_11:57:50.patch


 The reassign partitions command and the controller do not ensure that all 
 replicas for a partition are on different brokers. For example, you could set 
 1,2,2 as the list of brokers for the replicas.
 kafka-topics.sh --describe --under-replicated will list these partitions as 
 under-replicated, but I can't see a reason why the controller should allow 
 this state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1653) Duplicate broker ids allowed in replica assignment

2014-10-22 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1653:
-
Fix Version/s: 0.8.2

 Duplicate broker ids allowed in replica assignment
 --

 Key: KAFKA-1653
 URL: https://issues.apache.org/jira/browse/KAFKA-1653
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, 
 KAFKA-1653_2014-10-21_11:57:50.patch


 The reassign partitions command and the controller do not ensure that all 
 replicas for a partition are on different brokers. For example, you could set 
 1,2,2 as the list of brokers for the replicas.
 kafka-topics.sh --describe --under-replicated will list these partitions as 
 under-replicated, but I can't see a reason why the controller should allow 
 this state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-1653) Duplicate broker ids allowed in replica assignment

2014-10-22 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede closed KAFKA-1653.


 Duplicate broker ids allowed in replica assignment
 --

 Key: KAFKA-1653
 URL: https://issues.apache.org/jira/browse/KAFKA-1653
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1653.patch, KAFKA-1653_2014-10-16_14:54:07.patch, 
 KAFKA-1653_2014-10-21_11:57:50.patch


 The reassign partitions command and the controller do not ensure that all 
 replicas for a partition are on different brokers. For example, you could set 
 1,2,2 as the list of brokers for the replicas.
 kafka-topics.sh --describe --under-replicated will list these partitions as 
 under-replicated, but I can't see a reason why the controller should allow 
 this state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-22 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180181#comment-14180181
 ] 

Jiangjie Qin commented on KAFKA-1647:
-

[~nehanarkhede] I haven't got a chance to do tests yet... I'll do the test 
later this week and verify if it works.

 Replication offset checkpoints (high water marks) can be lost on hard kills 
 and restarts
 

 Key: KAFKA-1647
 URL: https://issues.apache.org/jira/browse/KAFKA-1647
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Jiangjie Qin
Priority: Critical
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
 KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch


 We ran into this scenario recently in a production environment. This can 
 happen when enough brokers in a cluster are taken down. i.e., a rolling 
 bounce done properly should not cause this issue. It can occur if all 
 replicas for any partition are taken down.
 Here is a sample scenario:
 * Cluster of three brokers: b0, b1, b2
 * Two partitions (of some topic) with replication factor two: p0, p1
 * Initial state:
 p0: leader = b0, ISR = {b0, b1}
 p1: leader = b1, ISR = {b0, b1}
 * Do a parallel hard-kill of all brokers
 * Bring up b2, so it is the new controller
 * b2 initializes its controller context and populates its leader/ISR cache 
 (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
 known leaders are b0 (for p0) and b1 (for p2)
 * Bring up b1
 * The controller's onBrokerStartup procedure initiates a replica state change 
 for all replicas on b1 to become online. As part of this replica state change 
 it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
 leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
 included in the leaders field because b0 is down.
 * On receiving the LeaderAndIsrRequest, b1's replica manager will 
 successfully make itself (b1) the leader for p1 (and create the local replica 
 object corresponding to p1). It will however abort the become follower 
 transition for p0 because the designated leader b0 is offline. So it will not 
 create the local replica object for p0.
 * It will then start the high water mark checkpoint thread. Since only p1 has 
 a local replica object, only p1's high water mark will be checkpointed to 
 disk. p0's previously written checkpoint  if any will be lost.
 So in summary it seems we should always create the local replica object even 
 if the online transition does not happen.
 Possible symptoms of the above bug could be one or more of the following (we 
 saw 2 and 3):
 # Data loss; yes on a hard-kill data loss is expected, but this can actually 
 cause loss of nearly all data if the broker becomes follower, truncates, and 
 soon after happens to become leader.
 # High IO on brokers that lose their high water mark then subsequently (on a 
 successful become follower transition) truncate their log to zero and start 
 catching up from the beginning.
 # If the offsets topic is affected, then offsets can get reset. This is 
 because during an offset load we don't read past the high water mark. So if a 
 water mark is missing then we don't load anything (even if the offsets are 
 there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Documentation with patches

2014-10-22 Thread Jay Kreps
Hey Joe,

I'd love to encourage documentation contributions.

I think we do have a way to contribute to docs. The current workflow for
contributing is
1. Checkout the docs
2. Change docs
3. Submit patch in normal way
4. Committer reviews and applies

For committers we have traditionally made the review step optional for docs.

In reality this skips step 1.5 which is fiddling with apache for an hour to
figure out how to get it to make apache includes work so you can see the
docs. I actually think this is the bigger barrier to doc changes.

One thing we could do is move the docs to one of the static site generators
to do the includes (e.g. Jekyll) this might make setup slightly easier
(although then you need to install Jekyll...).

-Jay

On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote:

 This comes up a lot but in reality not enough.  We don't have a great way
 for folks to modify the code and change (or add) to the documentation. I
 think the documentation is awesome and as we grow the code contributors
 that should continue with them too.

 One thought I had that would work is that we copy the SVN files into a
 /docs folder in git.  We can then take patches in git and then apply them
 to SVN when appropriate (like during a release or for immediate fixes).
 This way code changes in that patch can have documentation changes.  The
 committers can manage what is changed where as appropriate either prior to
 a release or live updates to the website. Yes/No?

 Thanks!

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /



Re: Documentation with patches

2014-10-22 Thread Jarek Jarcec Cecho
I would strongly support this idea. We have similar model in all other projects 
where I’m involved:

The docs are part of the usual code base and we do require contributors to 
update them when they are adding a new feature. And then during release time we 
simply take snapshot of the docs and upload them to our public webpages. This 
enables us to have simple versioned docs on the website, so that users can 
easily find docs for their version and also the public site do not contain docs 
of unreleased features :) There is a lot of ways how to achieve that - in Sqoop 
1 we used asciidoc to build the site, in Sqoop 2/Flume we’re using sphinx, 
Oozie is using markdown wiki...

Jarcec

 On Oct 22, 2014, at 10:27 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
 Hey Joe,
 
 I'd love to encourage documentation contributions.
 
 I think we do have a way to contribute to docs. The current workflow for
 contributing is
 1. Checkout the docs
 2. Change docs
 3. Submit patch in normal way
 4. Committer reviews and applies
 
 For committers we have traditionally made the review step optional for docs.
 
 In reality this skips step 1.5 which is fiddling with apache for an hour to
 figure out how to get it to make apache includes work so you can see the
 docs. I actually think this is the bigger barrier to doc changes.
 
 One thing we could do is move the docs to one of the static site generators
 to do the includes (e.g. Jekyll) this might make setup slightly easier
 (although then you need to install Jekyll...).
 
 -Jay
 
 On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote:
 
 This comes up a lot but in reality not enough.  We don't have a great way
 for folks to modify the code and change (or add) to the documentation. I
 think the documentation is awesome and as we grow the code contributors
 that should continue with them too.
 
 One thought I had that would work is that we copy the SVN files into a
 /docs folder in git.  We can then take patches in git and then apply them
 to SVN when appropriate (like during a release or for immediate fixes).
 This way code changes in that patch can have documentation changes.  The
 committers can manage what is changed where as appropriate either prior to
 a release or live updates to the website. Yes/No?
 
 Thanks!
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 



[jira] [Assigned] (KAFKA-1688) Add authorization interface and naive implementation

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-1688:
-

Assignee: Sriharsha Chintalapani

 Add authorization interface and naive implementation
 

 Key: KAFKA-1688
 URL: https://issues.apache.org/jira/browse/KAFKA-1688
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 Add a PermissionManager interface as described here:
 https://cwiki.apache.org/confluence/display/KAFKA/Security
 (possibly there is a better name?)
 Implement calls to the PermissionsManager in KafkaApis for the main requests 
 (FetchRequest, ProduceRequest, etc). We will need to add a new error code and 
 exception to the protocol to indicate permission denied.
 Add a server configuration to give the class you want to instantiate that 
 implements that interface. That class can define its own configuration 
 properties from the main config file.
 Provide a simple implementation of this interface which just takes a user and 
 ip whitelist and permits those in either of the whitelists to do anything, 
 and denies all others.
 Rather than writing an integration test for this class we can probably just 
 use this class for the TLS and SASL authentication testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1688) Add authorization interface and naive implementation

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180227#comment-14180227
 ] 

Sriharsha Chintalapani commented on KAFKA-1688:
---

[~jkreps] Can you please assign this [~bosco] I am not able to assign jiras. 
Thanks.

 Add authorization interface and naive implementation
 

 Key: KAFKA-1688
 URL: https://issues.apache.org/jira/browse/KAFKA-1688
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Sriharsha Chintalapani

 Add a PermissionManager interface as described here:
 https://cwiki.apache.org/confluence/display/KAFKA/Security
 (possibly there is a better name?)
 Implement calls to the PermissionsManager in KafkaApis for the main requests 
 (FetchRequest, ProduceRequest, etc). We will need to add a new error code and 
 exception to the protocol to indicate permission denied.
 Add a server configuration to give the class you want to instantiate that 
 implements that interface. That class can define its own configuration 
 properties from the main config file.
 Provide a simple implementation of this interface which just takes a user and 
 ip whitelist and permits those in either of the whitelists to do anything, 
 and denies all others.
 Rather than writing an integration test for this class we can probably just 
 use this class for the TLS and SASL authentication testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Documentation with patches

2014-10-22 Thread Jay Kreps
Currently we are handling the versioning problem by explicitly versioning
docs that change over time (configuration, quickstart, design, etc). This
is done by just creating a copy of these pages for each release in a
subdirectory. So we can commit documentation changes at any time for the
future release we just don't link up that release until it is out
(theoretically you could get there by guessing the url, but that is okay).
Although having multiple copies of certain pages, one for each release,
seems odd, I think it is actually better because in practice we often end
up editing old releases when we find problems in the older docs.

-Jay

On Wed, Oct 22, 2014 at 10:35 AM, Jarek Jarcec Cecho jar...@apache.org
wrote:

 I would strongly support this idea. We have similar model in all other
 projects where I’m involved:

 The docs are part of the usual code base and we do require contributors to
 update them when they are adding a new feature. And then during release
 time we simply take snapshot of the docs and upload them to our public
 webpages. This enables us to have simple versioned docs on the website, so
 that users can easily find docs for their version and also the public site
 do not contain docs of unreleased features :) There is a lot of ways how to
 achieve that - in Sqoop 1 we used asciidoc to build the site, in Sqoop
 2/Flume we’re using sphinx, Oozie is using markdown wiki...

 Jarcec

  On Oct 22, 2014, at 10:27 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
  Hey Joe,
 
  I'd love to encourage documentation contributions.
 
  I think we do have a way to contribute to docs. The current workflow for
  contributing is
  1. Checkout the docs
  2. Change docs
  3. Submit patch in normal way
  4. Committer reviews and applies
 
  For committers we have traditionally made the review step optional for
 docs.
 
  In reality this skips step 1.5 which is fiddling with apache for an hour
 to
  figure out how to get it to make apache includes work so you can see the
  docs. I actually think this is the bigger barrier to doc changes.
 
  One thing we could do is move the docs to one of the static site
 generators
  to do the includes (e.g. Jekyll) this might make setup slightly easier
  (although then you need to install Jekyll...).
 
  -Jay
 
  On Wed, Oct 22, 2014 at 9:55 AM, Joe Stein joe.st...@stealth.ly wrote:
 
  This comes up a lot but in reality not enough.  We don't have a great
 way
  for folks to modify the code and change (or add) to the documentation. I
  think the documentation is awesome and as we grow the code contributors
  that should continue with them too.
 
  One thought I had that would work is that we copy the SVN files into a
  /docs folder in git.  We can then take patches in git and then apply
 them
  to SVN when appropriate (like during a release or for immediate fixes).
  This way code changes in that patch can have documentation changes.  The
  committers can manage what is changed where as appropriate either prior
 to
  a release or live updates to the website. Yes/No?
 
  Thanks!
 
  /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /
 




Re: Review Request 26373: Patch for KAFKA-1647

2014-10-22 Thread Joel Koshy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26373/#review57836
---



core/src/main/scala/kafka/server/ReplicaManager.scala
https://reviews.apache.org/r/26373/#comment98711

This and the following change can be reverted. I will post some steps to 
reproduce locally on the ticket.


- Joel Koshy


On Oct. 22, 2014, 6:08 a.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26373/
 ---
 
 (Updated Oct. 22, 2014, 6:08 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1647
 https://issues.apache.org/jira/browse/KAFKA-1647
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Joel's comments.
 
 
 the version 2 code seems to be submitted by mistake... This should be the 
 code for review that addressed Joel's comments.
 
 
 Addressed Jun's comments. Will do tests to verify if it works.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/ReplicaManager.scala 
 78b7514cc109547c562e635824684fad581af653 
 
 Diff: https://reviews.apache.org/r/26373/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Created] (KAFKA-1724) Errors after reboot in single node setup

2014-10-22 Thread Ciprian Hacman (JIRA)
Ciprian Hacman created KAFKA-1724:
-

 Summary: Errors after reboot in single node setup
 Key: KAFKA-1724
 URL: https://issues.apache.org/jira/browse/KAFKA-1724
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Ciprian Hacman


In a single node setup, after reboot, Kafka logs show the following:
{code}
[2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up 
(kafka.controller.KafkaController)
[2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete 
(kafka.controller.KafkaController)
[2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: 
{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
 stored data: 
{jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
 (kafka.utils.ZkUtils$)
[2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node 
[{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}]
 at /brokers/ids/0 a while back in a different session, hence I will backoff 
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of 
/controller changed sent to 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] 
(org.I0Itec.zkclient.ZkEventThread)
java.lang.IllegalStateException: Kafka scheduler has not been started
at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
at 
kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
at 
kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
at kafka.utils.Utils$.inLock(Utils.scala:535)
at 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
[2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 with 
address ip-10-91-142-54.eu-west-1.compute.internal:9092. (kafka.utils.ZkUtils$)
[2014-10-22 16:37:28,849] INFO [Kafka Server 0], started 
(kafka.server.KafkaServer)
[2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. 
(kafka.network.Processor)
[2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. 
(kafka.network.Processor)
[2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. 
(kafka.network.Processor)
{code}
The last log line repeats forever and is correlated with errors on the app side.
Restarting Kafka fixes the errors.

Steps to reproduce (with help from the mailing list):
# start zookeeper
# start kafka-broker
# create topic or start a producer writing to a topic
# stop zookeeper
# stop kafka-broker( kafka broker shutdown goes into  WARN Session
0x14938d9dc010001 for server null, unexpected error, closing socket connection 
and attempting reconnect (org.apache.zookeeper.ClientCnxn) 
java.net.ConnectException: Connection refused)
# kill -9 kafka-broker
# restart zookeeper and than kafka-broker leads into the the error above




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1724) Errors after reboot in single node setup

2014-10-22 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani reassigned KAFKA-1724:
-

Assignee: Sriharsha Chintalapani

 Errors after reboot in single node setup
 

 Key: KAFKA-1724
 URL: https://issues.apache.org/jira/browse/KAFKA-1724
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Ciprian Hacman
Assignee: Sriharsha Chintalapani

 In a single node setup, after reboot, Kafka logs show the following:
 {code}
 [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up 
 (kafka.controller.KafkaController)
 [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete 
 (kafka.controller.KafkaController)
 [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: 
 {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
  stored data: 
 {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
  (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node 
 [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}]
  at /brokers/ids/0 a while back in a different session, hence I will backoff 
 for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of 
 /controller changed sent to 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] 
 (org.I0Itec.zkclient.ZkEventThread)
 java.lang.IllegalStateException: Kafka scheduler has not been started
 at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
 at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
 at 
 kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
 at 
 kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
 at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 
 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. 
 (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started 
 (kafka.server.KafkaServer)
 [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 {code}
 The last log line repeats forever and is correlated with errors on the app 
 side.
 Restarting Kafka fixes the errors.
 Steps to reproduce (with help from the mailing list):
 # start zookeeper
 # start kafka-broker
 # create topic or start a producer writing to a topic
 # stop zookeeper
 # stop kafka-broker( kafka broker shutdown goes into  WARN Session
 0x14938d9dc010001 for server null, unexpected error, closing socket 
 connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) 
 java.net.ConnectException: Connection refused)
 # kill -9 kafka-broker
 # restart zookeeper and than kafka-broker leads into the the error above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1724) Errors after reboot in single node setup

2014-10-22 Thread Ciprian Hacman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ciprian Hacman updated KAFKA-1724:
--
Fix Version/s: 0.8.2

 Errors after reboot in single node setup
 

 Key: KAFKA-1724
 URL: https://issues.apache.org/jira/browse/KAFKA-1724
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Ciprian Hacman
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.2


 In a single node setup, after reboot, Kafka logs show the following:
 {code}
 [2014-10-22 16:37:22,206] INFO [Controller 0]: Controller starting up 
 (kafka.controller.KafkaController)
 [2014-10-22 16:37:22,419] INFO [Controller 0]: Controller startup complete 
 (kafka.controller.KafkaController)
 [2014-10-22 16:37:22,554] INFO conflict in /brokers/ids/0 data: 
 {jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
  stored data: 
 {jmx_port:-1,timestamp:1413994171579,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}
  (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:22,736] INFO I wrote this conflicted ephemeral node 
 [{jmx_port:-1,timestamp:1413995842465,host:ip-10-91-142-54.eu-west-1.compute.internal,version:1,port:9092}]
  at /brokers/ids/0 a while back in a different session, hence I will backoff 
 for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:25,010] ERROR Error handling event ZkEvent[Data of 
 /controller changed sent to 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener@a6af882] 
 (org.I0Itec.zkclient.ZkEventThread)
 java.lang.IllegalStateException: Kafka scheduler has not been started
 at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
 at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
 at 
 kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
 at 
 kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
 at kafka.utils.Utils$.inLock(Utils.scala:535)
 at 
 kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
 at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 [2014-10-22 16:37:28,757] INFO Registered broker 0 at path /brokers/ids/0 
 with address ip-10-91-142-54.eu-west-1.compute.internal:9092. 
 (kafka.utils.ZkUtils$)
 [2014-10-22 16:37:28,849] INFO [Kafka Server 0], started 
 (kafka.server.KafkaServer)
 [2014-10-22 16:38:56,718] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 [2014-10-22 16:38:56,850] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 [2014-10-22 16:38:56,985] INFO Closing socket connection to /127.0.0.1. 
 (kafka.network.Processor)
 {code}
 The last log line repeats forever and is correlated with errors on the app 
 side.
 Restarting Kafka fixes the errors.
 Steps to reproduce (with help from the mailing list):
 # start zookeeper
 # start kafka-broker
 # create topic or start a producer writing to a topic
 # stop zookeeper
 # stop kafka-broker( kafka broker shutdown goes into  WARN Session
 0x14938d9dc010001 for server null, unexpected error, closing socket 
 connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) 
 java.net.ConnectException: Connection refused)
 # kill -9 kafka-broker
 # restart zookeeper and than kafka-broker leads into the the error above



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

2014-10-22 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180328#comment-14180328
 ] 

Joel Koshy commented on KAFKA-1647:
---

[~becket_qin] here are some steps to reproduce locally. There are probably 
simpler steps, but I ran into it while debugging something else, so here you go:

* Set up three brokers. Sample config: 
https://gist.github.com/jjkoshy/1ec36e5cef41ac4bd8fb (You will need to edit the 
logs directory and port)
* Create 50 topics;  each with 4 partitions; replication factor 2 {code}for i 
in {1..50}; do ./bin/kafka-topics.sh --create --topic test$i --zookeeper 
localhost:2181 --partitions 4 --replication-factor 2; done{code}
* Run producer performance: {code}./bin/kafka-producer-perf-test.sh --threads 4 
--broker-list localhost:9092,localhost:9093 --vary-message-size --messages 
922337203685477580 --topics 
test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12,test13,test14,test15,test16,test17,test18,test19,test20,test21,test22,test23,test24,test25,test26,test27,test28,test29,test30,test31,test32,test33,test34,test35,test36,test37,test38,test39,test40,test41,test42,test43,test44,test45,test46,test47,test48,test49,test50
 --message-size 500{code}
* Parallel hard kill of all brokers: {{pkill -9 -f Kafka}}
* Kill producer performance
* Restart brokers
* You should see WARN No checkpointed highwatermark is found for partition...


 Replication offset checkpoints (high water marks) can be lost on hard kills 
 and restarts
 

 Key: KAFKA-1647
 URL: https://issues.apache.org/jira/browse/KAFKA-1647
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Jiangjie Qin
Priority: Critical
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
 KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch


 We ran into this scenario recently in a production environment. This can 
 happen when enough brokers in a cluster are taken down. i.e., a rolling 
 bounce done properly should not cause this issue. It can occur if all 
 replicas for any partition are taken down.
 Here is a sample scenario:
 * Cluster of three brokers: b0, b1, b2
 * Two partitions (of some topic) with replication factor two: p0, p1
 * Initial state:
 p0: leader = b0, ISR = {b0, b1}
 p1: leader = b1, ISR = {b0, b1}
 * Do a parallel hard-kill of all brokers
 * Bring up b2, so it is the new controller
 * b2 initializes its controller context and populates its leader/ISR cache 
 (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
 known leaders are b0 (for p0) and b1 (for p2)
 * Bring up b1
 * The controller's onBrokerStartup procedure initiates a replica state change 
 for all replicas on b1 to become online. As part of this replica state change 
 it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
 (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
 leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
 included in the leaders field because b0 is down.
 * On receiving the LeaderAndIsrRequest, b1's replica manager will 
 successfully make itself (b1) the leader for p1 (and create the local replica 
 object corresponding to p1). It will however abort the become follower 
 transition for p0 because the designated leader b0 is offline. So it will not 
 create the local replica object for p0.
 * It will then start the high water mark checkpoint thread. Since only p1 has 
 a local replica object, only p1's high water mark will be checkpointed to 
 disk. p0's previously written checkpoint  if any will be lost.
 So in summary it seems we should always create the local replica object even 
 if the online transition does not happen.
 Possible symptoms of the above bug could be one or more of the following (we 
 saw 2 and 3):
 # Data loss; yes on a hard-kill data loss is expected, but this can actually 
 cause loss of nearly all data if the broker becomes follower, truncates, and 
 soon after happens to become leader.
 # High IO on brokers that lose their high water mark then subsequently (on a 
 successful become follower transition) truncate their log to zero and start 
 catching up from the beginning.
 # If the offsets topic is affected, then offsets can get reset. This is 
 because during an offset load we don't read past the high water mark. So if a 
 water mark is missing then we don't load anything (even if the offsets are 
 there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Mirror Maker upgraded to use Java Producer

2014-10-22 Thread Bhavesh Mistry
Hi Kafka Dev Team,

In trunk branch, is MM maker upgraded to use new Producer Code Base ?  I
just wanted to know if this is in plan or already done ?


Thanks,

Bhavesh


[jira] [Commented] (KAFKA-1695) Authenticate connection to Zookeeper

2014-10-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180538#comment-14180538
 ] 

Gwen Shapira commented on KAFKA-1695:
-

The good news is that Kafka works out of the box with secure ZooKeeper. The 
default ACL for ZK nodes is world:anyone:cdrwa.

I think we want to give users an option to secure their Kafka information in ZK 
to make sure that only a Kafka broker (and perhaps Kafka consumer) can read and 
write them. Especially important if we choose to store the broker part of the 
delegation token secret in ZK.

It looks like ZKClient has a PR for support of ACLs 
(https://github.com/sgroschupf/zkclient/pull/18), however its 3 years old...


 Authenticate connection to Zookeeper
 

 Key: KAFKA-1695
 URL: https://issues.apache.org/jira/browse/KAFKA-1695
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 We need to make it possible to secure the Zookeeper cluster Kafka is using. 
 This would make use of the normal authentication ZooKeeper provides. 
 ZooKeeper supports a variety of authentication mechanisms so we will need to 
 figure out what has to be passed in to the zookeeper client.
 The intention is that when the current round of client work is done it should 
 be possible to run without clients needing access to Zookeeper so all we need 
 here is to make it so that only the Kafka cluster is able to read and write 
 to the Kafka znodes  (we shouldn't need to set any kind of acl on a per-znode 
 basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26994: Patch for KAFKA-1719

2014-10-22 Thread Jiangjie Qin


 On Oct. 21, 2014, 10:21 p.m., Guozhang Wang wrote:
  core/src/main/scala/kafka/tools/MirrorMaker.scala, line 323
  https://reviews.apache.org/r/26994/diff/1/?file=727975#file727975line323
 
  Is this change intended?

Yes, it is intended, so that we can make sure each data channel queue will 
receive a shutdown message. Otherwise 2 messages could go to the same data 
channel queue.


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26994/#review57680
---


On Oct. 21, 2014, 8:37 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26994/
 ---
 
 (Updated Oct. 21, 2014, 8:37 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1719
 https://issues.apache.org/jira/browse/KAFKA-1719
 
 
 Repository: kafka
 
 
 Description
 ---
 
 make mirror maker exit when one consumer/producer thread exits.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 b8698ee1469c8fbc92ccc176d916eb3e28b87867 
 
 Diff: https://reviews.apache.org/r/26994/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




Re: Review Request 26994: Patch for KAFKA-1719

2014-10-22 Thread Neha Narkhede

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26994/#review57907
---



core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/26994/#comment98797

Is there any value in setting this to true? It seems that just checking if 
it is false and exiting the process suffices. Setting to true something that is 
called cleanShutdown, when in fact, it isn't a clean shutdown is confusing to 
read.

Also good to add a FATAL log entry as suggested by Guozhang as well.



core/src/main/scala/kafka/tools/MirrorMaker.scala
https://reviews.apache.org/r/26994/#comment98799

Ditto here.


- Neha Narkhede


On Oct. 21, 2014, 8:37 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26994/
 ---
 
 (Updated Oct. 21, 2014, 8:37 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1719
 https://issues.apache.org/jira/browse/KAFKA-1719
 
 
 Repository: kafka
 
 
 Description
 ---
 
 make mirror maker exit when one consumer/producer thread exits.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 b8698ee1469c8fbc92ccc176d916eb3e28b87867 
 
 Diff: https://reviews.apache.org/r/26994/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




Re: Mirror Maker upgraded to use Java Producer

2014-10-22 Thread Neha Narkhede
It is available through the --new.producer option.


On Wed, Oct 22, 2014 at 2:05 PM, Bhavesh Mistry mistry.p.bhav...@gmail.com
wrote:

 Hi Kafka Dev Team,

 In trunk branch, is MM maker upgraded to use new Producer Code Base ?  I
 just wanted to know if this is in plan or already done ?


 Thanks,

 Bhavesh



[jira] [Commented] (KAFKA-1695) Authenticate connection to Zookeeper

2014-10-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180572#comment-14180572
 ] 

Gwen Shapira commented on KAFKA-1695:
-

I left a note on the ZKClient pull request.

If Datameer doesn't merge this patch, I can see two options:
1. Fork ZKClient, add the ACL features to our fork and start building with our 
own ZKClient.
2. Replace ZKClient with a library that does support ACL (Curator, for example)

Any thoughts?

 Authenticate connection to Zookeeper
 

 Key: KAFKA-1695
 URL: https://issues.apache.org/jira/browse/KAFKA-1695
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 We need to make it possible to secure the Zookeeper cluster Kafka is using. 
 This would make use of the normal authentication ZooKeeper provides. 
 ZooKeeper supports a variety of authentication mechanisms so we will need to 
 figure out what has to be passed in to the zookeeper client.
 The intention is that when the current round of client work is done it should 
 be possible to run without clients needing access to Zookeeper so all we need 
 here is to make it so that only the Kafka cluster is able to read and write 
 to the Kafka znodes  (we shouldn't need to set any kind of acl on a per-znode 
 basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26994: Patch for KAFKA-1719

2014-10-22 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26994/
---

(Updated Oct. 22, 2014, 10:04 p.m.)


Review request for kafka.


Bugs: KAFKA-1719
https://issues.apache.org/jira/browse/KAFKA-1719


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Diffs (updated)
-

  core/src/main/scala/kafka/tools/MirrorMaker.scala 
b8698ee1469c8fbc92ccc176d916eb3e28b87867 

Diff: https://reviews.apache.org/r/26994/diff/


Testing
---


Thanks,

Jiangjie Qin



[jira] [Commented] (KAFKA-1719) Make mirror maker exit when one consumer/producer thread exits.

2014-10-22 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180616#comment-14180616
 ] 

Jiangjie Qin commented on KAFKA-1719:
-

Updated reviewboard https://reviews.apache.org/r/26994/diff/
 against branch origin/trunk

 Make mirror maker exit when one consumer/producer thread exits.
 ---

 Key: KAFKA-1719
 URL: https://issues.apache.org/jira/browse/KAFKA-1719
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1719.patch, KAFKA-1719_2014-10-22_15:04:32.patch


 When one of the consumer/producer thread exits, the entire mirror maker will 
 be blocked. In this case, it is better to make it exit. It seems a single 
 ZookeeperConsumerConnector is sufficient for mirror maker, probably we don't 
 need a list for the connectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1719) Make mirror maker exit when one consumer/producer thread exits.

2014-10-22 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1719:

Attachment: KAFKA-1719_2014-10-22_15:04:32.patch

 Make mirror maker exit when one consumer/producer thread exits.
 ---

 Key: KAFKA-1719
 URL: https://issues.apache.org/jira/browse/KAFKA-1719
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1719.patch, KAFKA-1719_2014-10-22_15:04:32.patch


 When one of the consumer/producer thread exits, the entire mirror maker will 
 be blocked. In this case, it is better to make it exit. It seems a single 
 ZookeeperConsumerConnector is sufficient for mirror maker, probably we don't 
 need a list for the connectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Mirror Maker upgraded to use Java Producer

2014-10-22 Thread Bhavesh Mistry
Hi Neha,

Thank you for this information.

Thanks,

Bhavesh

On Wed, Oct 22, 2014 at 2:34 PM, Neha Narkhede neha.narkh...@gmail.com
wrote:

 It is available through the --new.producer option.


 On Wed, Oct 22, 2014 at 2:05 PM, Bhavesh Mistry 
 mistry.p.bhav...@gmail.com
 wrote:

  Hi Kafka Dev Team,
 
  In trunk branch, is MM maker upgraded to use new Producer Code Base ?  I
  just wanted to know if this is in plan or already done ?
 
 
  Thanks,
 
  Bhavesh
 



[jira] [Created] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests

2014-10-22 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-1725:


 Summary: Configuration file bugs in system tests add noise to 
output and break a few tests
 Key: KAFKA-1725
 URL: https://issues.apache.org/jira/browse/KAFKA-1725
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor


There are some broken and misnamed system test configuration files 
(testcase_*_properties.json) that are causing a bunch of exceptions when 
running system tests and make it a lot harder to parse the output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 27060: Patch for KAFKA-1725

2014-10-22 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27060/
---

Review request for kafka.


Bugs: KAFKA-1725
https://issues.apache.org/jira/browse/KAFKA-1725


Repository: kafka


Description
---

KAFKA-1725: Clean up system test output: fix typo in system test case file, 
incorrectly named system test configuration files, and skip trying to generate 
metrics graphs when no data is available.


Diffs
-

  
system_test/mirror_maker_testsuite/testcase_15001/testcase_5001_properties.json 
 
  
system_test/mirror_maker_testsuite/testcase_15002/testcase_5002_properties.json 
 
  
system_test/mirror_maker_testsuite/testcase_15003/testcase_5003_properties.json 
 
  
system_test/mirror_maker_testsuite/testcase_15004/testcase_5004_properties.json 
 
  
system_test/mirror_maker_testsuite/testcase_15005/testcase_5005_properties.json 
 
  
system_test/mirror_maker_testsuite/testcase_15006/testcase_5006_properties.json 
 
  system_test/replication_testsuite/testcase_0001/testcase_0001_properties.json 
308f1937bbdc0fdcebdb8e9bc39e643c3f0c18be 
  
system_test/replication_testsuite/testcase_10101/testcase_0101_properties.json  
  
system_test/replication_testsuite/testcase_10102/testcase_0102_properties.json  
  
system_test/replication_testsuite/testcase_10103/testcase_0103_properties.json  
  
system_test/replication_testsuite/testcase_10104/testcase_0104_properties.json  
  
system_test/replication_testsuite/testcase_10105/testcase_0105_properties.json  
  
system_test/replication_testsuite/testcase_10106/testcase_0106_properties.json  
  
system_test/replication_testsuite/testcase_10107/testcase_0107_properties.json  
  
system_test/replication_testsuite/testcase_10108/testcase_0108_properties.json  
  
system_test/replication_testsuite/testcase_10109/testcase_0109_properties.json  
  
system_test/replication_testsuite/testcase_10110/testcase_0110_properties.json  
  
system_test/replication_testsuite/testcase_10131/testcase_0131_properties.json  
  
system_test/replication_testsuite/testcase_10132/testcase_0132_properties.json  
  
system_test/replication_testsuite/testcase_10133/testcase_0133_properties.json  
  
system_test/replication_testsuite/testcase_10134/testcase_0134_properties.json  
  system_test/utils/metrics.py d98d3cdeab00be9ddf4b7032a68da3886e4850c7 

Diff: https://reviews.apache.org/r/27060/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



[jira] [Updated] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests

2014-10-22 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1725:
-
Attachment: KAFKA-1725.patch

 Configuration file bugs in system tests add noise to output and break a few 
 tests
 -

 Key: KAFKA-1725
 URL: https://issues.apache.org/jira/browse/KAFKA-1725
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-1725.patch


 There are some broken and misnamed system test configuration files 
 (testcase_*_properties.json) that are causing a bunch of exceptions when 
 running system tests and make it a lot harder to parse the output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests

2014-10-22 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1725:
-
Status: Patch Available  (was: Open)

 Configuration file bugs in system tests add noise to output and break a few 
 tests
 -

 Key: KAFKA-1725
 URL: https://issues.apache.org/jira/browse/KAFKA-1725
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-1725.patch


 There are some broken and misnamed system test configuration files 
 (testcase_*_properties.json) that are causing a bunch of exceptions when 
 running system tests and make it a lot harder to parse the output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1725) Configuration file bugs in system tests add noise to output and break a few tests

2014-10-22 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180753#comment-14180753
 ] 

Ewen Cheslack-Postava commented on KAFKA-1725:
--

Created reviewboard https://reviews.apache.org/r/27060/diff/
 against branch origin/trunk

 Configuration file bugs in system tests add noise to output and break a few 
 tests
 -

 Key: KAFKA-1725
 URL: https://issues.apache.org/jira/browse/KAFKA-1725
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Minor
 Attachments: KAFKA-1725.patch


 There are some broken and misnamed system test configuration files 
 (testcase_*_properties.json) that are causing a bunch of exceptions when 
 running system tests and make it a lot harder to parse the output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy

2014-10-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180763#comment-14180763
 ] 

Guozhang Wang commented on KAFKA-1718:
--

Yes, that is what I was thinking about.

 Message Size Too Large error when only small messages produced with Snappy
 

 Key: KAFKA-1718
 URL: https://issues.apache.org/jira/browse/KAFKA-1718
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Evan Huus
Priority: Critical

 I'm the primary author of the Go bindings, and while I originally received 
 this as a bug against my bindings, I'm coming to the conclusion that it's a 
 bug in the broker somehow.
 Specifically, take a look at the last two kafka packets in the following 
 packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you 
 will need a trunk build of Wireshark to fully decode the kafka part of the 
 packets).
 The produce request contains two partitions on one topic. Each partition has 
 one message set (sizes 977205 bytes and 967362 bytes respectively). Each 
 message set is a sequential collection of snappy-compressed messages, each 
 message of size 46899. When uncompressed, each message contains a message set 
 of 999600 bytes, containing a sequence of uncompressed 1024-byte messages.
 However, the broker responds to this with a MessageSizeTooLarge error, full 
 stacktrace from the broker logs being:
 kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes 
 which exceeds the maximum configured message size of 112.
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267)
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
   at kafka.log.Log.append(Log.scala:265)
   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366)
   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:185)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:695)
 Since as far as I can tell none of the sizes in the actual produced packet 
 exceed the defined maximum, I can only assume that the broker is 
 miscalculating something somewhere and throwing the exception improperly.
 ---
 This issue can be reliably reproduced using an out-of-the-box binary download 
 of 0.8.1.1 and the following gist: 
 https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use 
 the `producer-ng` branch of the Sarama library).
 ---
 I am happy to provide any more information you might need, or to do relevant 
 experiments etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26373: Patch for KAFKA-1647

2014-10-22 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26373/#review57947
---



core/src/main/scala/kafka/server/ReplicaManager.scala
https://reviews.apache.org/r/26373/#comment98856

I am not sure that I understand how Option.flatten works. Would it be 
clearly if we first filter out partitions with no live leader and then generate 
the TopicAndPartition to BrokerAndInitialOffset map?


- Jun Rao


On Oct. 22, 2014, 6:08 a.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26373/
 ---
 
 (Updated Oct. 22, 2014, 6:08 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1647
 https://issues.apache.org/jira/browse/KAFKA-1647
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Joel's comments.
 
 
 the version 2 code seems to be submitted by mistake... This should be the 
 code for review that addressed Joel's comments.
 
 
 Addressed Jun's comments. Will do tests to verify if it works.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/ReplicaManager.scala 
 78b7514cc109547c562e635824684fad581af653 
 
 Diff: https://reviews.apache.org/r/26373/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Commented] (KAFKA-1718) Message Size Too Large error when only small messages produced with Snappy

2014-10-22 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180856#comment-14180856
 ] 

Jun Rao commented on KAFKA-1718:


[~guozhang], removing the max message size may be a bigger change. We no only 
have to patch both the regular and follower consumer, but probably also log 
compaction, tools that read the logs directly. Also, having a max message size 
can be a good thing since it limits the memory consumption in the reader.

As for this issue, we can change the behavior on the broker. However, it's bit 
tricky since currently, we don't have the api to create a ByteBufferMessageSet 
with more than 1 already compressed message. So, for now, we can probably just 
document the behavior in the wiki. 

Evan,

If you want to help make the wiki change, I can give you permission. Just let 
me know your wiki id. Thanks,

 Message Size Too Large error when only small messages produced with Snappy
 

 Key: KAFKA-1718
 URL: https://issues.apache.org/jira/browse/KAFKA-1718
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Evan Huus
Priority: Critical

 I'm the primary author of the Go bindings, and while I originally received 
 this as a bug against my bindings, I'm coming to the conclusion that it's a 
 bug in the broker somehow.
 Specifically, take a look at the last two kafka packets in the following 
 packet capture: https://dl.dropboxusercontent.com/u/171647/kafka.pcapng (you 
 will need a trunk build of Wireshark to fully decode the kafka part of the 
 packets).
 The produce request contains two partitions on one topic. Each partition has 
 one message set (sizes 977205 bytes and 967362 bytes respectively). Each 
 message set is a sequential collection of snappy-compressed messages, each 
 message of size 46899. When uncompressed, each message contains a message set 
 of 999600 bytes, containing a sequence of uncompressed 1024-byte messages.
 However, the broker responds to this with a MessageSizeTooLarge error, full 
 stacktrace from the broker logs being:
 kafka.common.MessageSizeTooLargeException: Message size is 1070127 bytes 
 which exceeds the maximum configured message size of 112.
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:267)
   at kafka.log.Log$$anonfun$append$1.apply(Log.scala:265)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
   at kafka.log.Log.append(Log.scala:265)
   at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:354)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:376)
   at 
 kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:366)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
   at 
 scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:366)
   at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:292)
   at kafka.server.KafkaApis.handle(KafkaApis.scala:185)
   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
   at java.lang.Thread.run(Thread.java:695)
 Since as far as I can tell none of the sizes in the actual produced packet 
 exceed the defined maximum, I can only assume that the broker is 
 miscalculating something somewhere and throwing the exception improperly.
 ---
 This issue can be reliably reproduced using an out-of-the-box binary download 
 of 0.8.1.1 and the following gist: 
 https://gist.github.com/eapache/ce0f15311c605a165ce7 (you will need to use 
 the `producer-ng` branch of the Sarama library).
 ---
 I am happy to provide any more information you might need, or to do relevant 
 experiments etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 24676: Fix KAFKA-1583

2014-10-22 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24676/
---

(Updated Oct. 23, 2014, 1:53 a.m.)


Review request for kafka.


Summary (updated)
-

Fix KAFKA-1583


Bugs: KAFKA-1583
https://issues.apache.org/jira/browse/KAFKA-1583


Repository: kafka


Description (updated)
---

Incoporate Joel's comments after rebase


Diffs (updated)
-

  core/src/main/scala/kafka/api/FetchRequest.scala 
59c09155dd25fad7bed07d3d00039e3dc66db95c 
  core/src/main/scala/kafka/api/FetchResponse.scala 
8d085a1f18f803b3cebae4739ad8f58f95a6c600 
  core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
861a6cf11dc6b6431fcbbe9de00c74a122f204bd 
  core/src/main/scala/kafka/api/ProducerRequest.scala 
b2366e7eedcac17f657271d5293ff0bef6f3cbe6 
  core/src/main/scala/kafka/api/ProducerResponse.scala 
a286272c834b6f40164999ff8b7f8998875f2cfe 
  core/src/main/scala/kafka/cluster/Partition.scala 
e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 
  core/src/main/scala/kafka/common/ErrorMapping.scala 
880ab4a004f078e5d84446ea6e4454ecc06c95f2 
  core/src/main/scala/kafka/log/Log.scala 
157d67369baabd2206a2356b2aa421e848adab17 
  core/src/main/scala/kafka/network/BoundedByteBufferSend.scala 
a624359fb2059340bb8dc1619c5b5f226e26eb9b 
  core/src/main/scala/kafka/server/DelayedFetch.scala 
e0f14e25af03e6d4344386dcabc1457ee784d345 
  core/src/main/scala/kafka/server/DelayedProduce.scala 
9481508fc2d6140b36829840c337e557f3d090da 
  core/src/main/scala/kafka/server/FetchRequestPurgatory.scala 
ed1318891253556cdf4d908033b704495acd5724 
  core/src/main/scala/kafka/server/KafkaApis.scala 
85498b4a1368d3506f19c4cfc64934e4d0ac4c90 
  core/src/main/scala/kafka/server/OffsetManager.scala 
43eb2a35bb54d32c66cdb94772df657b3a104d1a 
  core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala 
d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
78b7514cc109547c562e635824684fad581af653 
  core/src/main/scala/kafka/server/RequestPurgatory.scala 
9d76234bc2c810ec08621dc92bb4061b8e7cd993 
  core/src/main/scala/kafka/utils/DelayedItem.scala 
d7276494072f14f1cdf7d23f755ac32678c5675c 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
209a409cb47eb24f83cee79f4e064dbc5f5e9d62 
  core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala 
fb61d552f2320fedec547400fbbe402a0b2f5d87 
  core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
03a424d45215e1e7780567d9559dae4d0ae6fc29 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
cd302aa51eb8377d88b752d48274e403926439f2 
  core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala 
a9c4ddc78df0b3695a77a12cf8cf25521a203122 
  core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
a577f4a8bf420a5bc1e62fad6d507a240a42bcaa 
  core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 
3804a114e97c849cae48308997037786614173fc 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 

Diff: https://reviews.apache.org/r/24676/diff/


Testing
---

Unit tests


Thanks,

Guozhang Wang



[jira] [Updated] (KAFKA-1583) Kafka API Refactoring

2014-10-22 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1583:
-
Attachment: KAFKA-1583_2014-10-22_18:52:52.patch

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch, KAFKA-1583_2014-10-22_18:52:52.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1583) Kafka API Refactoring

2014-10-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180899#comment-14180899
 ] 

Guozhang Wang commented on KAFKA-1583:
--

Updated reviewboard https://reviews.apache.org/r/24676/diff/
 against branch origin/trunk

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch, KAFKA-1583_2014-10-22_18:52:52.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 24676: Fix KAFKA-1583

2014-10-22 Thread Guozhang Wang


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/cluster/Partition.scala, line 245
  https://reviews.apache.org/r/24676/diff/11/?file=724366#file724366line245
 
  Maybe use this:
  Recorded replica %d log end offset (LEO)...
  
  Also, instead of an explicit [%s,%d] format specifier I think we should 
  start doing the following:
  
  %s.format(TopicAndPartition(topic, partition))
  
  That way we ensure a canonical toString for topic/partition pairs and 
  can change it in one place in the future.
  
  There are some places where we don't log with this agreed-upon format 
  and it is a bit annoying, so going forward I think we should use the above. 
  Can we add it to the logging improvements wiki?

Updated the logging wiki. We can refer people to it when we make logging format 
comments moving forward.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/cluster/Partition.scala, line 259
  https://reviews.apache.org/r/24676/diff/11/?file=724366#file724366line259
 
  Since we still may update the HW shall we rename this to 
  maybeUpdateHWAndExpandIsr

The reason I changed its name is that the original name is a bit misleading 
that only this function can possibly update HW, instead I add in the comments 
for each function like expandISR and updateHW about which logic may triggers it.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/DelayedFetch.scala, line 99
  https://reviews.apache.org/r/24676/diff/11/?file=724370#file724370line99
 
  I'm a bit confused by case C. It can also happen if the delayed fetch 
  happens to straddle a segment roll event; the comment seems a bit 
  misleading/incomplete without that.
  
  In fact, if it is lagging shouldn't it have been satisfied immediately 
  without having to create a DelayedFetch in the first place?

It could be the case that it is lagging on one partition, but that alone cannot 
give enough data for the fetch.min.bytes since other partitions are all caught 
up. I reworded the comments a bit.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/KafkaApis.scala, line 187
  https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line187
 
  Why is this additional logging necessary?
  
  KafkaApis currently has catch-all for unhandled exceptions.
  
  Error codes can be inspected via public access logs if required right?

The exception is already caught in the Replica manager, which does not re-throw 
but only set the error code. Hence the request log will not record this as an 
failed request.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/KafkaApis.scala, line 423
  https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line423
 
  Are these changes intentional?

Yes. According to our logging wiki this should be debug level since they are 
not server side errors.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/ReplicaManager.scala, line 46
  https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line46
 
  Should we rename ReplicaManager to ReplicatedLogManager?

I am going to do all the renaming in a follow-up JIRA.


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/KafkaApis.scala, line 261
  https://reviews.apache.org/r/24676/diff/11/?file=724373#file724373line261
 
  I'm not sure how scala treats this under the hood, but it _has_ to hold 
  a reference to request until the callback is executed. i.e., we probably 
  still want to empty the request data.

Good point!


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/ReplicaManager.scala, line 120
  https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line120
 
  (for regular consumer fetch)

Actually this is for both consumer / follower fetch


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/server/ReplicaManager.scala, line 265
  https://reviews.apache.org/r/24676/diff/11/?file=724376#file724376line265
 
  This is old code and we don't need to address it in this patch, but I 
  was wondering if it makes sense to respond sooner if there is at least one 
  error in the local append. What do you think? i.e., I don't remember a good 
  reason for holding on to the request if there are i  numPartitions errors 
  in local append.

I think today we are already responding immediately after a failure in local 
append, right?


 On Oct. 22, 2014, 1:46 a.m., Joel Koshy wrote:
  core/src/main/scala/kafka/utils/DelayedItem.scala, line 23
  https://reviews.apache.org/r/24676/diff/11/?file=724378#file724378line23
 
  We don't really need this class anymore and it can be folded into 
  DelayedRequest right?

I am going to do this in a follow-up JIRA.


- Guozhang



Re: Review Request 26994: Patch for KAFKA-1719

2014-10-22 Thread Guozhang Wang


 On Oct. 22, 2014, 9:32 p.m., Neha Narkhede wrote:
  core/src/main/scala/kafka/tools/MirrorMaker.scala, line 271
  https://reviews.apache.org/r/26994/diff/1/?file=727975#file727975line271
 
  Is there any value in setting this to true? It seems that just checking 
  if it is false and exiting the process suffices. Setting to true something 
  that is called cleanShutdown, when in fact, it isn't a clean shutdown is 
  confusing to read.
  
  Also good to add a FATAL log entry as suggested by Guozhang as well.

The boolean is used when the internal threads tries to exist, when it is not 
set, the threads knows it is closing abnormally and hence goes on to handle it. 
I agree its name is a bit misleading, and probably we can just name it as 
isShuttingDown.


- Guozhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26994/#review57907
---


On Oct. 22, 2014, 10:04 p.m., Jiangjie Qin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26994/
 ---
 
 (Updated Oct. 22, 2014, 10:04 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1719
 https://issues.apache.org/jira/browse/KAFKA-1719
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Guozhang's comments.
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/tools/MirrorMaker.scala 
 b8698ee1469c8fbc92ccc176d916eb3e28b87867 
 
 Diff: https://reviews.apache.org/r/26994/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jiangjie Qin
 




[jira] [Commented] (KAFKA-1683) Implement a session concept in the socket server

2014-10-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180983#comment-14180983
 ] 

Gwen Shapira commented on KAFKA-1683:
-

We need to map SelectorKeys (representing connections) to a Session object 
(containing user and other identifiers).
This can be done in a map that will get populated when an AUTH request appears 
and used by the rest of the APIs.

Since the Request that appears in the API contains the SelectorKeys (RequestKey 
field, currently unused, but I'm glad someone had the foresight to add it), I 
think it makes much more sense to manage this mapping in KafkaApis, rather than 
in SocketServer. This can be internal to the API layer - create the Session 
object and add the mapping (Request.requestKey.hashcode()-Session) when an 
AuthRequest happens and have the various handlers use it to map the requests 
with the keys to sessions. 

An alternative design would be to maintain this mapping in the SocketServer and 
have the processor add this information to Request when it creates the request. 
I don't like the idea of moving this responsibility to the network layer when 
it can be done on a higher level, which it can - because the request object 
already has a session identifier. 

So unless someone objects - I'm going with the modification to KafkaApis.
[~jkreps][~joestein] - any issues with this? 

As a side note, other systems (pretty much anything HTTP-based, REST, Thrift, 
etc) send the Kerberos session ticket in every request and use it to 
re-authenticate and provide the identity rather than maintain a stable session. 
This can be an alternative design, although I'd think its one with higher 
overhead.

 Implement a session concept in the socket server
 --

 Key: KAFKA-1683
 URL: https://issues.apache.org/jira/browse/KAFKA-1683
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.9.0
Reporter: Jay Kreps
Assignee: Gwen Shapira

 To implement authentication we need a way to keep track of some things 
 between requests. The initial use for this would be remembering the 
 authenticated user/principle info, but likely more uses would come up (for 
 example we will also need to remember whether and which encryption or 
 integrity measures are in place on the socket so we can wrap and unwrap 
 writes and reads).
 I was thinking we could just add a Session object that might have a user 
 field. The session object would need to get added to RequestChannel.Request 
 so it is passed down to the API layer with each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1687) SASL tests

2014-10-22 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira reassigned KAFKA-1687:
---

Assignee: Gwen Shapira

 SASL tests
 --

 Key: KAFKA-1687
 URL: https://issues.apache.org/jira/browse/KAFKA-1687
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 We need tests for our SASL/Kerberos setup. This is not that easy to do with 
 Kerberos because of the dependency on the KDC. However possibly we can test 
 with another SASL mechanism that doesn't have that dependency?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1687) SASL tests

2014-10-22 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181037#comment-14181037
 ] 

Gwen Shapira commented on KAFKA-1687:
-

Picked it up so this will have an owner. This is blocked until KAFKA-1686 is 
in, so don't expect to see work here right away :)


 SASL tests
 --

 Key: KAFKA-1687
 URL: https://issues.apache.org/jira/browse/KAFKA-1687
 Project: Kafka
  Issue Type: Sub-task
  Components: security
Reporter: Jay Kreps
Assignee: Gwen Shapira

 We need tests for our SASL/Kerberos setup. This is not that easy to do with 
 Kerberos because of the dependency on the KDC. However possibly we can test 
 with another SASL mechanism that doesn't have that dependency?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)