Re: Build failed in Jenkins: KafkaPreCommit #167

2015-07-29 Thread Ismael Juma
On Wed, Jul 29, 2015 at 4:52 PM, Guozhang Wang wangg...@gmail.com wrote:

 It is likely that we make two commits at roughly the same time that
 triggers two builds, how could this issue be resolve under this case?


It should not happen in a typical configuration, but hard to tell what is
wrong without having access to the job configuration. Two possible options:
a committer with Jenkins access needs to investigate or a BUILDS issue
needs to be file asking for help from them.

Best,
Ismael


[jira] [Resolved] (KAFKA-2382) KafkaConsumer seek methods should throw an exception when called for partitions not assigned to this consumer instance

2015-07-29 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-2382.
--
Resolution: Invalid

Ah, good call. Somehow I missed that, guess I didn't dig quite deep enough 
through the SubscriptionState code.

 KafkaConsumer seek methods should throw an exception when called for 
 partitions not assigned to this consumer instance
 --

 Key: KAFKA-2382
 URL: https://issues.apache.org/jira/browse/KAFKA-2382
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Ewen Cheslack-Postava
Assignee: Neha Narkhede
 Fix For: 0.8.3


 It looks like the current code for the consumer will blindly accept any seek 
 calls, adding the offests to SubscriptionState. If the consumer is being used 
 in simple consumer mode, this makes sense, but when using subscriptions  the 
 consumer coordinator, this should be an error.
 If a user accidentally invokes the seek at the wrong time, it will just get 
 lost. As a simple example, if you start the consumer, subscribe, and then 
 immediately seek, that seek will just get lost as soon as you call poll() and 
 the initial join group + rebalance occurs. That sequence of calls simply 
 shouldn't be valid since it doesn't make sense to seek() on a partition you 
 haven't been assigned.
 Relatedly, I think the current effect of doing this can result in incorrect 
 behavior because SubscriptionState.hasAllFetchedPositions() only checks the 
 size of the fetched map and assignedPartitions map. Since this bug allows 
 adding arbitrary topic partitions to the fetched map, that check is not 
 accurate.
 This is probably related to KAFKA-2343, but that one is just a doc fix on how 
 seek is supposed to behave wrt poll and rebalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Build failed in Jenkins: KafkaPreCommit #167

2015-07-29 Thread Guozhang Wang
It is likely that we make two commits at roughly the same time that
triggers two builds, how could this issue be resolve under this case?

On Wed, Jul 29, 2015 at 2:23 AM, Ismael Juma ism...@juma.me.uk wrote:

 It looks like two builds are running in the same workspace at the same time
 somehow.

 Ismael

 On Wed, Jul 29, 2015 at 1:44 AM, Guozhang Wang wangg...@gmail.com wrote:

  Anyone knows how this Could not open buildscript class issue could
 happen
  and can we fix it on our side or it is a general jenkins issue?
 
  Guozhang
 
  On Tue, Jul 28, 2015 at 5:39 PM, Apache Jenkins Server 
  jenk...@builds.apache.org wrote:
 
   See https://builds.apache.org/job/KafkaPreCommit/167/changes
  
   Changes:
  
   [wangguoz] KAFKA-2276; KIP-25 initial patch
  
   --
   Started by an SCM change
   Building remotely on ubuntu3 (Ubuntu ubuntu legacy-ubuntu) in
 workspace 
   https://builds.apache.org/job/KafkaPreCommit/ws/
 git rev-parse --is-inside-work-tree # timeout=10
   Fetching changes from the remote Git repository
 git config remote.origin.url
   https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
   Fetching upstream changes from
   https://git-wip-us.apache.org/repos/asf/kafka.git
 git --version # timeout=10
 git fetch --tags --progress
   https://git-wip-us.apache.org/repos/asf/kafka.git
   +refs/heads/*:refs/remotes/origin/*
 git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
   Checking out Revision e43c9aff92c57da6abb0c1d0af3431a550110a89
   (refs/remotes/origin/trunk)
 git config core.sparsecheckout # timeout=10
 git checkout -f e43c9aff92c57da6abb0c1d0af3431a550110a89
 git rev-list f4101ab3fcf7ec65f6541b157f1894ffdc8d861d # timeout=10
   Setting
  
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
   [KafkaPreCommit] $ /bin/bash -xe /tmp/hudson7546159244531236383.sh
   +
  
 
 /home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1/bin/gradle
   To honour the JVM settings for this build a new JVM will be forked.
  Please
   consider using the daemon:
   http://gradle.org/docs/2.1/userguide/gradle_daemon.html.
   Building project 'core' with Scala version 2.10.5
   :downloadWrapper UP-TO-DATE
  
   BUILD SUCCESSFUL
  
   Total time: 16.059 secs
   Setting
  
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
   [KafkaPreCommit] $ /bin/bash -xe /tmp/hudson4914211160587558932.sh
   + ./gradlew -PscalaVersion=2.10.1 test
   To honour the JVM settings for this build a new JVM will be forked.
  Please
   consider using the daemon:
   http://gradle.org/docs/2.4/userguide/gradle_daemon.html.
  
   FAILURE: Build failed with an exception.
  
   * What went wrong:
   Could not open buildscript class cache for settings file
   '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle'
  
 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
Timeout waiting to lock buildscript class cache for settings file
   '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle'
  
 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
   It is currently in use by another Gradle instance.
 Owner PID: unknown
 Our PID: 23737
 Owner Operation: unknown
 Our operation: Initialize cache
 Lock file:
  
 
 /home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript/cache.properties.lock
  
   * Try:
   Run with --stacktrace option to get the stack trace. Run with --info or
   --debug option to get more log output.
  
   BUILD FAILED
  
   Total time: 1 mins 3.207 secs
   Build step 'Execute shell' marked build as failure
   Setting
  
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
  
 
 
 
  --
  -- Guozhang
 




-- 
-- Guozhang


Re: Connection to zk shell on Kafka

2015-07-29 Thread Prabhjot Bharaj
Sure. It would be great if you could as well explain the reason why the
absence of the jar creates this problem

Also, I'm surprised that zookeeper that comes bundled with kafka 0.8.2 does
not have the jline jar

Regards,
prabcs

On Wed, Jul 29, 2015 at 10:45 PM, Chris Barlock barl...@us.ibm.com wrote:

 You need the jline JAR file that ships with ZooKeeper.

 Chris

 IBM Tivoli Systems
 Research Triangle Park, NC
 (919) 224-2240
 Internet:  barl...@us.ibm.com



 From:   Prabhjot Bharaj prabhbha...@gmail.com
 To: us...@kafka.apache.org, u...@zookeeper.apache.org
 Date:   07/29/2015 01:13 PM
 Subject:Connection to zk shell on Kafka



 Hi folks,

 */kafka/bin# ./zookeeper-shell.sh localhost:2182/*

 *Connecting to localhost:2182/*

 *Welcome to ZooKeeper!*

 *JLine support is disabled*


  *WATCHER::*


  *WatchedEvent state:SyncConnected type:None path:null*


 *The shell never says connected*

 I'm running 5 node zookeeper cluster on 5-node kafka cluster (each kafka
 broker has 1 zookeeper server running)

 When I try connecting to the shell, the shell never says 'Connected'



 However, if I try connecting on another standalone zookeeper  which has no
 links to kafka, I'm able to connect:-


 */kafka/bin# /zookeeper/scripts/zkCli.sh -server 127.0.0.1:2181
 http://127.0.0.1:2181*

 *Connecting to 127.0.0.1:2181 http://127.0.0.1:2181*

 *Welcome to ZooKeeper!*

 *JLine support is enabled*


 *WATCHER::*


 *WatchedEvent state:SyncConnected type:None path:null*

 *[zk: 127.0.0.1:2181(CONNECTED) 0]*

 Am I missing something?


 Thanks,

 prabcs




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


[jira] [Commented] (KAFKA-2382) KafkaConsumer seek methods should throw an exception when called for partitions not assigned to this consumer instance

2015-07-29 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646338#comment-14646338
 ] 

Jason Gustafson commented on KAFKA-2382:


[~ewencp] I could be wrong, but I think the code currently throws 
IllegalArgumentException when you call seek for unassigned partitions. It's a 
little obscure since the exceptions aren't thrown from seek() directly, but 
from fetched() and consumed(). I think this is clarified a little in my PR for 
KAFKA-2350, if you want to have a look.

 KafkaConsumer seek methods should throw an exception when called for 
 partitions not assigned to this consumer instance
 --

 Key: KAFKA-2382
 URL: https://issues.apache.org/jira/browse/KAFKA-2382
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Ewen Cheslack-Postava
Assignee: Neha Narkhede
 Fix For: 0.8.3


 It looks like the current code for the consumer will blindly accept any seek 
 calls, adding the offests to SubscriptionState. If the consumer is being used 
 in simple consumer mode, this makes sense, but when using subscriptions  the 
 consumer coordinator, this should be an error.
 If a user accidentally invokes the seek at the wrong time, it will just get 
 lost. As a simple example, if you start the consumer, subscribe, and then 
 immediately seek, that seek will just get lost as soon as you call poll() and 
 the initial join group + rebalance occurs. That sequence of calls simply 
 shouldn't be valid since it doesn't make sense to seek() on a partition you 
 haven't been assigned.
 Relatedly, I think the current effect of doing this can result in incorrect 
 behavior because SubscriptionState.hasAllFetchedPositions() only checks the 
 size of the fetched map and assignedPartitions map. Since this bug allows 
 adding arbitrary topic partitions to the fetched map, that check is not 
 accurate.
 This is probably related to KAFKA-2343, but that one is just a doc fix on how 
 seek is supposed to behave wrt poll and rebalance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1913) App hungs when calls producer.send to wrong IP of Kafka broker

2015-07-29 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646373#comment-14646373
 ] 

Ewen Cheslack-Postava commented on KAFKA-1913:
--

[~igor.quickblox] I think this is a difference in how localhost and 192.168.9.3 
are handling invalid connection requests. On localhost, the TCP connection is 
probably being immediately. However, the other system may be configured to 
simply ignore connection requests sent to ports that nothing is listening on. 
This means that the client just has to wait until the TCP connection timeout, 
which may be quite long.

I think KAFKA-2120 is going to fix this for you by adding a client-side 
timeout. This timeout should apply regardless of whether you're able to connect 
to the broker in your broker list.

 App hungs when calls producer.send to wrong IP of Kafka broker
 --

 Key: KAFKA-1913
 URL: https://issues.apache.org/jira/browse/KAFKA-1913
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.1.1
 Environment: OS X 10.10.1, Java 7, AWS Linux
Reporter: Igor Khomenko
Assignee: Jun Rao
 Fix For: 0.8.3


 I have next test code to check the Kafka functionality:
 {code}
 package com.company;
 import kafka.common.FailedToSendMessageException;
 import kafka.javaapi.producer.Producer;
 import kafka.producer.KeyedMessage;
 import kafka.producer.ProducerConfig;
 import java.util.Date;
 import java.util.Properties;
 public class Main {
 public static void main(String[] args) {
 Properties props = new Properties();
 props.put(metadata.broker.list, 192.168.9.3:9092);
 props.put(serializer.class, com.company.KafkaMessageSerializer);
 props.put(request.required.acks, 1);
 ProducerConfig config = new ProducerConfig(props);
 // The first is the type of the Partition key, the second the type of 
 the message.
 ProducerString, String messagesProducer = new ProducerString, 
 String(config);
 // Send
 String topicName = my_messages;
 String message = hello world;
 KeyedMessageString, String data = new KeyedMessageString, 
 String(topicName, message);
 try {
 System.out.println(new Date() + : sending...);
 messagesProducer.send(data);
 System.out.println(new Date() +  : sent);
 }catch (FailedToSendMessageException e){
 System.out.println(e:  + e);
 e.printStackTrace();
 }catch (Exception exc){
 System.out.println(e:  + exc);
 exc.printStackTrace();
 }
 }
 }
 {code}
 {code}
 package com.company;
 import kafka.serializer.Encoder;
 import kafka.utils.VerifiableProperties;
 /**
  * Created by igorkhomenko on 2/2/15.
  */
 public class KafkaMessageSerializer implements EncoderString {
 public KafkaMessageSerializer(VerifiableProperties verifiableProperties) {
 /* This constructor must be present for successful compile. */
 }
 @Override
 public byte[] toBytes(String entity) {
 byte [] serializedMessage = doCustomSerialization(entity);
 return serializedMessage;
 }
 private byte[] doCustomSerialization(String entity) {
 return entity.getBytes();
 }
 }
 {code}
 Here is also GitHub version https://github.com/soulfly/Kafka-java-producer
 So it just hungs on next line:
 {code}
 messagesProducer.send(data)
 {code}
 When I replaced the brokerlist to
 {code}
 props.put(metadata.broker.list, localhost:9092);
 {code}
 then I got an exception:
 {code}
 kafka.common.FailedToSendMessageException: Failed to send messages after 3 
 tries.
 {code}
 so it's okay
 Why it hungs with wrong brokerlist? Any ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2214) kafka-reassign-partitions.sh --verify should return non-zero exit codes when reassignment is not completed yet

2015-07-29 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-2214:

Status: In Progress  (was: Patch Available)

Moved back to in progress since there are unaddressed review comments.

 kafka-reassign-partitions.sh --verify should return non-zero exit codes when 
 reassignment is not completed yet
 --

 Key: KAFKA-2214
 URL: https://issues.apache.org/jira/browse/KAFKA-2214
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Affects Versions: 0.8.2.0, 0.8.1.1
Reporter: Michael Noll
Assignee: Manikumar Reddy
Priority: Minor
 Attachments: KAFKA-2214.patch, KAFKA-2214_2015-07-10_21:56:04.patch, 
 KAFKA-2214_2015-07-13_21:10:58.patch, KAFKA-2214_2015-07-14_15:31:12.patch, 
 KAFKA-2214_2015-07-14_15:40:49.patch


 h4. Background
 The admin script {{kafka-reassign-partitions.sh}} should integrate better 
 with automation tools such as Ansible, which rely on scripts adhering to Unix 
 best practices such as appropriate exit codes on success/failure.
 h4. Current behavior (incorrect)
 When reassignments are still in progress {{kafka-reassign-partitions.sh}} 
 prints {{ERROR}} messages but returns an exit code of zero, which indicates 
 success.  This behavior makes it a bit cumbersome to integrate the script 
 into automation tools such as Ansible.
 {code}
 $ kafka-reassign-partitions.sh --zookeeper zookeeper1:2181 
 --reassignment-json-file partitions-to-move.json --verify
 Status of partition reassignment:
 ERROR: Assigned replicas (316,324,311) don't match the list of replicas for 
 reassignment (316,324) for partition [mytopic,2]
 Reassignment of partition [mytopic,0] completed successfully
 Reassignment of partition [myothertopic,1] completed successfully
 Reassignment of partition [myothertopic,3] completed successfully
 ...
 $ echo $?
 0
 # But preferably the exit code in the presence of ERRORs should be, say, 1.
 {code}
 h3. How to improve
 I'd suggest that, using the above as the running example, if there are any 
 {{ERROR}} entries in the output (i.e. if there are any assignments remaining 
 that don't match the desired assignments), then the 
 {{kafka-reassign-partitions.sh}}  should return a non-zero exit code.
 h3. Notes
 In Kafka 0.8.2 the output is a bit different: The ERROR messages are now 
 phrased differently.
 Before:
 {code}
 ERROR: Assigned replicas (316,324,311) don't match the list of replicas for 
 reassignment (316,324) for partition [mytopic,2]
 {code}
 Now:
 {code}
 Reassignment of partition [mytopic,2] is still in progress
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Ismael Juma
Hi Gwen,

Thanks for the feedback. Comments below.

On Wed, Jul 29, 2015 at 6:40 PM, Gwen Shapira gshap...@cloudera.com wrote:

 The jira comment is a way for the committer to say thank you to
 people who were involved in the review process.


If we just want to say thank you, then why not just say that then? Using
the word reviewers in this context is unusual from my experience (and I
am an obsessive reader of open-source commits :)).

It doesn't have any
 formal implications - the responsibility for committing good code is
 on the committer (thats the whole point). It doesn't even have
 informal implications - no one ever went after a reviewer if a code
 turned out buggy.


Sure, it's not about going after people. We are nice around here. :) Still,
correct attribution is important. Open-source code in GitHub is seen by
many people and in various contexts.

I suggest: Leave it up to the committer best judgement and not
 introduce process where there's really no need for one.


Perhaps. Personally, I think we should consider what the contributors
position too instead of just leaving it to the committer.

Best,
Ismael


Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Ismael Juma
Hi Parth,

On Wed, Jul 29, 2015 at 6:50 PM, Parth Brahmbhatt 
pbrahmbh...@hortonworks.com wrote:

 +1 on Gwen¹s suggestion.

 Consider this as my thank you for all the reviews everyone has done in
 past and are going to do in future. Don¹t make me say thanks on every
 single commit. Introducing another process when the project has  50 PR
 open pretty much all the time is not really going to help.


I think you misunderstood. This is done by the person who merges the pull
request via the merge tool. You, as the creator of the PR, don't have to do
anything. Search for Reviewers in the following for example:

https://github.com/apache/kafka/commit/e43c9aff92c57da6abb0c1d0af3431a550110a89

It is already part of the process (Gwen actually asked for this to be added
to the merge tool):

https://issues.apache.org/jira/browse/KAFKA-2344

There is no new process being suggested. It's just a clarification on
something that already exists.

Best,
Ismael


Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
I agree with Ewen that a single threaded model will be tricky to implement
the same conventional semantic of async or Future. We just drafted the
following wiki which explains our thoughts in LinkedIn on the new consumer
API and threading model.

https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

We were trying to see:
1. If we can use some kind of methodology to help us think about what API
we want to provide to user for different use cases.
2. What is the pros and cons of current single threaded model. Is there a
way that we can maintain the benefits while solve the issues we are facing
now with single threaded model.

Thanks,

Jiangjie (Becket) Qin

On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future besides
 adding the callback mainly because of the implementation complexity I feel
 it could introduce along with the retry settings after looking through the
 code base. I would happy to change my mind if we could propose a prototype
 implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit offsets,
 their semantics are so different that overloading a single method with both
 is messy. That said, I don't think we should consider this an inconsistency
 wrt the new producer API's use of Future because the two APIs have a much
 more fundamental difference that justifies it: they have completely
 different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io wrote:

 Hey Adi,

 When we designed the initial version, the producer API was still
 changing. I thought about adding the Future and then just didn't get to it.
 I agree that we should look into adding it for consistency.

 Thanks,
 Neha

 On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar aaurad...@linkedin.com
  wrote:

 Great discussion everyone!

 One general comment on the sync/async API's on the new consumer. I
 think the producer tackles sync vs async API's well. For API's that can
 either be sync or async, can we simply return a future? That seems more
 elegant for the API's that make sense either in both flavors. From the
 users perspective, it is more consistent with the new producer. One easy
 example is the commit call with the CommitType enum.. we can make that call
 always async and users can block on the future if they want to make sure
 their offsets are committed.

 Aditya


 On Mon, Jul 27, 2015 at 2:06 PM, Onur Karaman okara...@linkedin.com
 wrote:

 Thanks for the great responses, everyone!

 To expand a tiny bit on my initial post: while I did bring up old high
 level consumers, the teams we spoke to were actually not the types of
 services that simply wanted an easy way to get ConsumerRecords. We spoke 
 to
 infrastructure teams that I would consider to be closer to 

[jira] [Resolved] (KAFKA-2357) Update zookeeper.connect description in Kafka documentation

2015-07-29 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-2357.
--
Resolution: Fixed

 Update zookeeper.connect description in Kafka documentation
 ---

 Key: KAFKA-2357
 URL: https://issues.apache.org/jira/browse/KAFKA-2357
 Project: Kafka
  Issue Type: Bug
Reporter: Yuto Sasaki
Assignee: David Jacot
 Fix For: 0.8.1.2, 0.8.3

 Attachments: KAFKA-2357-1.patch


 Since https://issues.apache.org/jira/browse/KAFKA-404 chroot pass is created 
 on startup.
 So the description quoted below is wrong:
 bq. Note that you must create this path yourself prior to starting the broker
 cf. 
 http://mail-archives.apache.org/mod_mbox/kafka-users/201507.mbox/%3CCAHBV8WcjXeUnEH4KGZ_2f_kPGJ5M%3DC%3DPaAuiOSGdx52cA4s4gg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2357) Update zookeeper.connect description in Kafka documentation

2015-07-29 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2357:
-
Status: In Progress  (was: Patch Available)

Thanks for the patch, committed to the web site docs.

 Update zookeeper.connect description in Kafka documentation
 ---

 Key: KAFKA-2357
 URL: https://issues.apache.org/jira/browse/KAFKA-2357
 Project: Kafka
  Issue Type: Bug
Reporter: Yuto Sasaki
Assignee: David Jacot
 Fix For: 0.8.1.2, 0.8.3

 Attachments: KAFKA-2357-1.patch


 Since https://issues.apache.org/jira/browse/KAFKA-404 chroot pass is created 
 on startup.
 So the description quoted below is wrong:
 bq. Note that you must create this path yourself prior to starting the broker
 cf. 
 http://mail-archives.apache.org/mod_mbox/kafka-users/201507.mbox/%3CCAHBV8WcjXeUnEH4KGZ_2f_kPGJ5M%3DC%3DPaAuiOSGdx52cA4s4gg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2351) Brokers are having a problem shutting down correctly

2015-07-29 Thread Mayuresh Gharat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayuresh Gharat updated KAFKA-2351:
---
Status: Patch Available  (was: In Progress)

 Brokers are having a problem shutting down correctly
 

 Key: KAFKA-2351
 URL: https://issues.apache.org/jira/browse/KAFKA-2351
 Project: Kafka
  Issue Type: Bug
Reporter: Mayuresh Gharat
Assignee: Mayuresh Gharat
 Attachments: KAFKA-2351.patch, KAFKA-2351_2015-07-21_14:58:13.patch, 
 KAFKA-2351_2015-07-23_21:36:52.patch


 The run() in Acceptor during shutdown might throw an exception that is not 
 caught and it never reaches shutdownComplete due to which the latch is not 
 counted down and the broker will not be able to shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2300) Error in controller log when broker tries to rejoin cluster

2015-07-29 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646794#comment-14646794
 ] 

Guozhang Wang commented on KAFKA-2300:
--

[~junrao] would you like to take a look?

 Error in controller log when broker tries to rejoin cluster
 ---

 Key: KAFKA-2300
 URL: https://issues.apache.org/jira/browse/KAFKA-2300
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: Johnny Brown
Assignee: Flavio Junqueira
 Attachments: KAFKA-2300-controller-logs.tar.gz, 
 KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch


 Hello Kafka folks,
 We are having an issue where a broker attempts to join the cluster after 
 being restarted, but is never added to the ISR for its assigned partitions. 
 This is a three-node cluster, and the controller is broker 2.
 When broker 1 starts, we see the following message in broker 2's 
 controller.log.
 {{
 [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error 
 while handling broker changes 
 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
 java.lang.IllegalStateException: Controller to broker state change requests 
 batch is not empty while creating a new one. Some UpdateMetadata state 
 changes Map(2 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  1 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  3 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)))
  might be lost 
   at 
 kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
   at 
 kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
   at 
 kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at kafka.utils.Utils$.inLock(Utils.scala:535)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 }}
 {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of 
 it persists in the controller's memory, causing an exception which interrupts 
 the state change triggered by the broker startup.
 Has anyone seen something like this? Any idea what's happening here? Any 
 information would be greatly appreciated.
 Thanks,
 Johnny



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka Consumer thoughts

2015-07-29 Thread Jason Gustafson
I think this proposal matches pretty well with what user's intuitively
expect the implementation to be. At a glance, I don't see any problems with
doing the liveness detection in the background thread. It also has the
advantage that the frequency of heartbeats (which controls how long
rebalancing takes) can be kept orthogonal to the rate of consumption. One
concern is about the rebalance callback. It would have to be executed in
the background thread, right? That means the user might have to implement
their own synchronization, which might be tricky to do right.

-Jason


On Wed, Jul 29, 2015 at 1:20 PM, Neha Narkhede n...@confluent.io wrote:

 Works now. Thanks Becket!

 On Wed, Jul 29, 2015 at 1:19 PM, Jiangjie Qin j...@linkedin.com wrote:

 Ah... My bad, forgot to change the URL link for pictures.
 Thanks for the quick response, Neha. It should be fixed now, can you try
 again?

 Jiangjie (Becket) Qin

 On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks Becket. Quick comment - there seem to be a bunch of images that
 the wiki refers to, but none loaded for me. Just making sure if its just me
 or can everyone not see the pictures?

 On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com
 wrote:

 I agree with Ewen that a single threaded model will be tricky to
 implement the same conventional semantic of async or Future. We just
 drafted the following wiki which explains our thoughts in LinkedIn on the
 new consumer API and threading model.


 https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

 We were trying to see:
 1. If we can use some kind of methodology to help us think about what
 API we want to provide to user for different use cases.
 2. What is the pros and cons of current single threaded model. Is there
 a way that we can maintain the benefits while solve the issues we are
 facing now with single threaded model.

 Thanks,

 Jiangjie (Becket) Qin

 On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava 
 e...@confluent.io wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com
 wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future
 besides adding the callback mainly because of the implementation 
 complexity
 I feel it could introduce along with the retry settings after looking
 through the code base. I would happy to change my mind if we could 
 propose
 a prototype implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was 
 thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit
 offsets, their semantics are so different that overloading a single method
 with both is messy. That said, I don't think we should consider this an
 inconsistency wrt the new producer API's use of Future because the two 
 APIs
 have a much more fundamental difference that justifies it: they have
 completely different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API 
 makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io
 wrote:

 Hey Adi,

 

Re: Kafka Consumer thoughts

2015-07-29 Thread Jay Kreps
Some comments on the proposal:

I think we are conflating a number of things that should probably be
addressed individually because they are unrelated. My past experience is
that this always makes progress hard. The more we can pick apart these
items the better:

   1. threading model
   2. blocking vs non-blocking semantics
   3. missing apis
   4. missing javadoc and other api surprises
   5. Throwing exceptions.

The missing APIs are getting added independently. Some like your proposed
offsetByTime where things we agreed to hold off on for the first release
and do when we'd thought it through. If there are uses for it now we can
accelerate. I think each of these is really independent, we know there are
things that need to be added but lumping them all into one discussion will
be confusing.

WRT throwing exceptions the policy is to throw exceptions that are
unrecoverable and handle and log other exceptions that are transient. That
policy makes sense if you go through the thought exercise of what will the
user do if i throw this exception to them if they have no other rational
response but to retry (and if failing to anticipate and retry with that
exception will kill their program) . You can argue whether the topic not
existing is transient or not, unfortunately the way we did auto-creation
makes it transient if you are in auto create mode and non-transient
otherwise (ick!). In any case this is an orthogonal discussion to
everything else. I think the policy is right and if we don't conform to it
in some way that is really an independent bug/discussion.

I suggest we focus on threading and the current event-loop style of api
design since I think that is really the crux.

The analogy between the producer threading model and the consumer model
actually doesn't work for me. The goal of the producer is actually to take
requests from many many user threads and shove them into a single buffer
for batching. So the threading model isn't the 1:1 threads you describe it
is N:1.The goal of the consumer is to support single-threaded processing.
This is what drives the difference. Saying that the producer has N:1
threads therefore for the consumer should have 1:1 threads instead of just
1 thread doesn't make sense any more then an analogy to the brokers
threading model would--the problem we're solving is totally different.

I think ultimately though what you need to think about is, does an event
loop style of API make sense? That is the source of all the issues you
describe. This style of API is incredibly prevalent from unix select to
GUIs to node.js. It's a great way to model multiple channels of messages
coming in. It is a fantastic style for event processing. Programmers
understand this style of api though I would agree it is unusual compared to
blocking apis. But it is is a single threaded processing model. The current
approach is basically a pure event loop with some convenience methods that
are effectively poll until X is complete.

I think basically all the confusion you are describing comes from not
documenting/expecting an event loop. The if you don't call poll nothing
happens point is basically this. It's an event loop. You have to loop. You
can't not call poll. The docs don't cover this right now, perhaps. I think
if they do it's not unreasonable behavior.

If we want to move away from an event loop I'm not sure *any* aspect of the
current event loop style of api makes sense any more. I am not totally
married to event loops, but i do think what we have gives an elegant way of
implementing any higher level abstractions that would fully implement the
user's parallelism model. I don't want to go rethink everything but I do
think a half-way implementation that is event loop + background threads is
likely going to be icky.

WRT making it configurable whether liveness means actually consuming or
background thread running I would suggest that that is really the worst
outcome. These type of modes that are functionally totally different are
just awful from a documentation, testing, usability, etc pov. I would
strongly prefer we pick either of these, document it, and make it work well
rather than trying to do both.

-Jay








On Wed, Jul 29, 2015 at 1:20 PM, Neha Narkhede n...@confluent.io wrote:

 Works now. Thanks Becket!

 On Wed, Jul 29, 2015 at 1:19 PM, Jiangjie Qin j...@linkedin.com wrote:

 Ah... My bad, forgot to change the URL link for pictures.
 Thanks for the quick response, Neha. It should be fixed now, can you try
 again?

 Jiangjie (Becket) Qin

 On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks Becket. Quick comment - there seem to be a bunch of images that
 the wiki refers to, but none loaded for me. Just making sure if its just me
 or can everyone not see the pictures?

 On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com
 wrote:

 I agree with Ewen that a single threaded model will be tricky to
 implement the same conventional semantic of async or 

[jira] [Updated] (KAFKA-2300) Error in controller log when broker tries to rejoin cluster

2015-07-29 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2300:
-
Reviewer: Jun Rao

 Error in controller log when broker tries to rejoin cluster
 ---

 Key: KAFKA-2300
 URL: https://issues.apache.org/jira/browse/KAFKA-2300
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: Johnny Brown
Assignee: Flavio Junqueira
 Attachments: KAFKA-2300-controller-logs.tar.gz, 
 KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch


 Hello Kafka folks,
 We are having an issue where a broker attempts to join the cluster after 
 being restarted, but is never added to the ISR for its assigned partitions. 
 This is a three-node cluster, and the controller is broker 2.
 When broker 1 starts, we see the following message in broker 2's 
 controller.log.
 {{
 [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error 
 while handling broker changes 
 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
 java.lang.IllegalStateException: Controller to broker state change requests 
 batch is not empty while creating a new one. Some UpdateMetadata state 
 changes Map(2 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  1 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  3 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)))
  might be lost 
   at 
 kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
   at 
 kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
   at 
 kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at kafka.utils.Utils$.inLock(Utils.scala:535)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 }}
 {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of 
 it persists in the controller's memory, causing an exception which interrupts 
 the state change triggered by the broker startup.
 Has anyone seen something like this? Any idea what's happening here? Any 
 information would be greatly appreciated.
 Thanks,
 Johnny



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: KAfka Mirror Maker

2015-07-29 Thread Jiangjie Qin
Mirror Maker does not have specific restrictions on cluster size.

The error you saw was because consumer was not able to talk to the broker.
Can you try to use kafka-console-consumer to consume some data from your
source cluster and see if it works? It should be under KAFKA_HOME/bin/

Jiangjie (Becket) Qin

On Tue, Jul 28, 2015 at 11:01 AM, Prabhjot Bharaj prabhbha...@gmail.com
wrote:

 Hi,

 I'm using Mirror Maker with a cluster of 3 nodes and cluster of 5 nodes.

 I would like to ask - is the number of nodes a restriction for Mirror
 Maker?
 Also, are there any other restrictions or properties that should be common
 across both the clusters so that they continue mirroring.


 I'm asking this because I've got this error while mirroring:-

 [2015-07-28 17:51:10,943] WARN Fetching topic metadata with correlation id
 0 for topics [Set(fromIndiaWithLove)] from broker
 [id:3,host:a10.2.3.4,port:9092] failed (kafka.client.ClientUtils$)

 java.nio.channels.ClosedChannelException

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

 at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

 at

 kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

 at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

 at

 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

 [2015-07-28 17:51:18,955] WARN Fetching topic metadata with correlation id
 0 for topics [Set(fromIndiaWithLove)] from broker
 [id:2,host:10.2.3.5,port:9092] failed (kafka.client.ClientUtils$)

 java.nio.channels.ClosedChannelException

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

 at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

 at

 kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

 at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

 at

 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

 [2015-07-28 17:51:27,043] WARN Fetching topic metadata with correlation id
 0 for topics [Set(fromIndiaWithLove)] from broker
 [id:5,host:a10.2.3.6port:9092] failed (kafka.client.ClientUtils$)

 java.nio.channels.ClosedChannelException

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

 at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

 at

 kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

 at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

 at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

 at

 kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



 This is what my *consumer config* looks like:-

 *zookeeper.connect=10.2.3.4:2182 http://10.2.3.4:2182*

 *zookeeper.connection.timeout.ms
 http://zookeeper.connection.timeout.ms=100*

 *consumer.timeout.ms http://consumer.timeout.ms=-1*

 *group.id http://group.id=dp-mirrorMaker-test-datap1*

 *shallow.iterator.enable=true*

 *auto.create.topics.enable=true*



 I've used the default* producer.properties* in kafka/config/ which has
 these properteis:-

 *metadata.broker.list=localhost:9092*


 *producer.type=sync*

 *compression.codec=none*


 *serializer.class=kafka.serializer.DefaultEncoder*


 I'm running Mirror Maker via this command:-


  /kafka_2.10-0.8.2.0/bin/kafka-run-class.sh kafka.tools.MirrorMaker
 --consumer.config ~/sourceCluster1Consumer.config  --num.streams 1
 --producer.config producer.properties --whitelist=.*

 Regards,

 prabcs



Re: Kafka Consumer thoughts

2015-07-29 Thread Neha Narkhede
Thanks Becket. Quick comment - there seem to be a bunch of images that the
wiki refers to, but none loaded for me. Just making sure if its just me or
can everyone not see the pictures?

On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com wrote:

 I agree with Ewen that a single threaded model will be tricky to implement
 the same conventional semantic of async or Future. We just drafted the
 following wiki which explains our thoughts in LinkedIn on the new consumer
 API and threading model.


 https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

 We were trying to see:
 1. If we can use some kind of methodology to help us think about what API
 we want to provide to user for different use cases.
 2. What is the pros and cons of current single threaded model. Is there a
 way that we can maintain the benefits while solve the issues we are facing
 now with single threaded model.

 Thanks,

 Jiangjie (Becket) Qin

 On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava e...@confluent.io
  wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com
 wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future besides
 adding the callback mainly because of the implementation complexity I feel
 it could introduce along with the retry settings after looking through the
 code base. I would happy to change my mind if we could propose a prototype
 implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit
 offsets, their semantics are so different that overloading a single method
 with both is messy. That said, I don't think we should consider this an
 inconsistency wrt the new producer API's use of Future because the two APIs
 have a much more fundamental difference that justifies it: they have
 completely different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io
 wrote:

 Hey Adi,

 When we designed the initial version, the producer API was still
 changing. I thought about adding the Future and then just didn't get to it.
 I agree that we should look into adding it for consistency.

 Thanks,
 Neha

 On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar 
 aaurad...@linkedin.com wrote:

 Great discussion everyone!

 One general comment on the sync/async API's on the new consumer. I
 think the producer tackles sync vs async API's well. For API's that can
 either be sync or async, can we simply return a future? That seems more
 elegant for the API's that make sense either in both flavors. From the
 users perspective, it is more consistent with the new producer. One easy
 example is the commit call with the CommitType enum.. we can make that 
 call
 always async and users can block on the future if they want to make sure
 their offsets are committed.

 Aditya


 On Mon, Jul 27, 2015 at 2:06 PM, Onur Karaman okara...@linkedin.com
 wrote:

 Thanks for the great responses, everyone!

 

Re: Kafka Consumer thoughts

2015-07-29 Thread Guozhang Wang
The images load for me well.

On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks Becket. Quick comment - there seem to be a bunch of images that the
 wiki refers to, but none loaded for me. Just making sure if its just me or
 can everyone not see the pictures?

 On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com wrote:

 I agree with Ewen that a single threaded model will be tricky to
 implement the same conventional semantic of async or Future. We just
 drafted the following wiki which explains our thoughts in LinkedIn on the
 new consumer API and threading model.


 https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

 We were trying to see:
 1. If we can use some kind of methodology to help us think about what API
 we want to provide to user for different use cases.
 2. What is the pros and cons of current single threaded model. Is there a
 way that we can maintain the benefits while solve the issues we are facing
 now with single threaded model.

 Thanks,

 Jiangjie (Becket) Qin

 On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava 
 e...@confluent.io wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com
 wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future besides
 adding the callback mainly because of the implementation complexity I feel
 it could introduce along with the retry settings after looking through the
 code base. I would happy to change my mind if we could propose a prototype
 implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit
 offsets, their semantics are so different that overloading a single method
 with both is messy. That said, I don't think we should consider this an
 inconsistency wrt the new producer API's use of Future because the two APIs
 have a much more fundamental difference that justifies it: they have
 completely different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io
 wrote:

 Hey Adi,

 When we designed the initial version, the producer API was still
 changing. I thought about adding the Future and then just didn't get to 
 it.
 I agree that we should look into adding it for consistency.

 Thanks,
 Neha

 On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar 
 aaurad...@linkedin.com wrote:

 Great discussion everyone!

 One general comment on the sync/async API's on the new consumer. I
 think the producer tackles sync vs async API's well. For API's that can
 either be sync or async, can we simply return a future? That seems more
 elegant for the API's that make sense either in both flavors. From the
 users perspective, it is more consistent with the new producer. One easy
 example is the commit call with the CommitType enum.. we can make that 
 call
 always async and users can block on the future if they want to make sure
 their offsets are committed.

 Aditya


 On Mon, Jul 

Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
Ah... My bad, forgot to change the URL link for pictures.
Thanks for the quick response, Neha. It should be fixed now, can you try
again?

Jiangjie (Becket) Qin

On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks Becket. Quick comment - there seem to be a bunch of images that the
 wiki refers to, but none loaded for me. Just making sure if its just me or
 can everyone not see the pictures?

 On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com wrote:

 I agree with Ewen that a single threaded model will be tricky to
 implement the same conventional semantic of async or Future. We just
 drafted the following wiki which explains our thoughts in LinkedIn on the
 new consumer API and threading model.


 https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

 We were trying to see:
 1. If we can use some kind of methodology to help us think about what API
 we want to provide to user for different use cases.
 2. What is the pros and cons of current single threaded model. Is there a
 way that we can maintain the benefits while solve the issues we are facing
 now with single threaded model.

 Thanks,

 Jiangjie (Becket) Qin

 On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava 
 e...@confluent.io wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com
 wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future besides
 adding the callback mainly because of the implementation complexity I feel
 it could introduce along with the retry settings after looking through the
 code base. I would happy to change my mind if we could propose a prototype
 implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit
 offsets, their semantics are so different that overloading a single method
 with both is messy. That said, I don't think we should consider this an
 inconsistency wrt the new producer API's use of Future because the two APIs
 have a much more fundamental difference that justifies it: they have
 completely different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io
 wrote:

 Hey Adi,

 When we designed the initial version, the producer API was still
 changing. I thought about adding the Future and then just didn't get to 
 it.
 I agree that we should look into adding it for consistency.

 Thanks,
 Neha

 On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar 
 aaurad...@linkedin.com wrote:

 Great discussion everyone!

 One general comment on the sync/async API's on the new consumer. I
 think the producer tackles sync vs async API's well. For API's that can
 either be sync or async, can we simply return a future? That seems more
 elegant for the API's that make sense either in both flavors. From the
 users perspective, it is more consistent with the new producer. One easy
 example is the commit call with the CommitType enum.. we can make that 
 

Build failed in Jenkins: KafkaPreCommit #168

2015-07-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/KafkaPreCommit/168/changes

Changes:

[cshapi] KAFKA-2100; Client Error doesn't preserve or display original server 
error code when it is an unknown code; Reviewed by Gwen, Guozhang and Ewen

--
Started by an SCM change
Building remotely on ubuntu3 (Ubuntu ubuntu legacy-ubuntu) in workspace 
https://builds.apache.org/job/KafkaPreCommit/ws/
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.url 
  https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
  git --version # timeout=10
  git fetch --tags --progress 
  https://git-wip-us.apache.org/repos/asf/kafka.git 
  +refs/heads/*:refs/remotes/origin/*
  git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
  git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision b7bd2978dc3947297fefc06ff9b22949d5bd1b50 
(refs/remotes/origin/trunk)
  git config core.sparsecheckout # timeout=10
  git checkout -f b7bd2978dc3947297fefc06ff9b22949d5bd1b50
  git rev-list e43c9aff92c57da6abb0c1d0af3431a550110a89 # timeout=10
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
[KafkaPreCommit] $ /bin/bash -xe /tmp/hudson313172124125357484.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.1/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.5
:downloadWrapper UP-TO-DATE

BUILD SUCCESSFUL

Total time: 27.658 secs
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
[KafkaPreCommit] $ /bin/bash -xe /tmp/hudson8026424288504756815.sh
+ ./gradlew -PscalaVersion=2.10.1 test
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4/userguide/gradle_daemon.html.

FAILURE: Build failed with an exception.

* What went wrong:
Could not open buildscript class cache for settings file 
'/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle' 
(/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
 Timeout waiting to lock buildscript class cache for settings file 
 '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle' 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
  It is currently in use by another Gradle instance.
  Owner PID: unknown
  Our PID: 3068
  Owner Operation: unknown
  Our operation: Initialize cache
  Lock file: 
/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript/cache.properties.lock

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 4.88 secs
Build step 'Execute shell' marked build as failure
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1


Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Gwen Shapira
I guess we see the reviewer part with different interpretations.

What are the benefits you see of formalizing who gets mentioned as reviewer?

On Wed, Jul 29, 2015 at 11:06 AM, Ismael Juma ism...@juma.me.uk wrote:
 Hi Gwen,

 Thanks for the feedback. Comments below.

 On Wed, Jul 29, 2015 at 6:40 PM, Gwen Shapira gshap...@cloudera.com wrote:

 The jira comment is a way for the committer to say thank you to
 people who were involved in the review process.


 If we just want to say thank you, then why not just say that then? Using
 the word reviewers in this context is unusual from my experience (and I
 am an obsessive reader of open-source commits :)).

 It doesn't have any
 formal implications - the responsibility for committing good code is
 on the committer (thats the whole point). It doesn't even have
 informal implications - no one ever went after a reviewer if a code
 turned out buggy.


 Sure, it's not about going after people. We are nice around here. :) Still,
 correct attribution is important. Open-source code in GitHub is seen by
 many people and in various contexts.

 I suggest: Leave it up to the committer best judgement and not
 introduce process where there's really no need for one.


 Perhaps. Personally, I think we should consider what the contributors
 position too instead of just leaving it to the committer.

 Best,
 Ismael


Re: Kafka Consumer thoughts

2015-07-29 Thread Neha Narkhede
Works now. Thanks Becket!

On Wed, Jul 29, 2015 at 1:19 PM, Jiangjie Qin j...@linkedin.com wrote:

 Ah... My bad, forgot to change the URL link for pictures.
 Thanks for the quick response, Neha. It should be fixed now, can you try
 again?

 Jiangjie (Becket) Qin

 On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede n...@confluent.io wrote:

 Thanks Becket. Quick comment - there seem to be a bunch of images that
 the wiki refers to, but none loaded for me. Just making sure if its just me
 or can everyone not see the pictures?

 On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com wrote:

 I agree with Ewen that a single threaded model will be tricky to
 implement the same conventional semantic of async or Future. We just
 drafted the following wiki which explains our thoughts in LinkedIn on the
 new consumer API and threading model.


 https://cwiki.apache.org/confluence/display/KAFKA/New+consumer+API+change+proposal

 We were trying to see:
 1. If we can use some kind of methodology to help us think about what
 API we want to provide to user for different use cases.
 2. What is the pros and cons of current single threaded model. Is there
 a way that we can maintain the benefits while solve the issues we are
 facing now with single threaded model.

 Thanks,

 Jiangjie (Becket) Qin

 On Tue, Jul 28, 2015 at 10:28 PM, Ewen Cheslack-Postava 
 e...@confluent.io wrote:



 On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wangg...@gmail.com
 wrote:

 I think Ewen has proposed these APIs for using callbacks along with
 returning future in the commit calls, i.e. something similar to:

 public Futurevoid commit(ConsumerCommitCallback callback);

 public Futurevoid commit(MapTopicPartition, Long offsets,
 ConsumerCommitCallback callback);

 At that time I was slightly intending not to include the Future
 besides adding the callback mainly because of the implementation 
 complexity
 I feel it could introduce along with the retry settings after looking
 through the code base. I would happy to change my mind if we could propose
 a prototype implementation that is simple enough.


 One of the reasons that interface ended up being difficult (or maybe
 impossible) to make work reasonably is because the consumer was thread-safe
 at the time. That made it impossible to know what should be done when
 Future.get() is called -- should the implementation call poll() itself, or
 would the fact that the user is calling get() imply that there's a
 background thread running the poll() loop and we just need to wait for it?

 The consumer is no longer thread safe, but I think the same problem
 remains because the expectation with Futures is that they are thread safe.
 Which means that even if the consumer isn't thread safe, I would expect to
 be able to hand that Future off to some other thread, have the second
 thread call get(), and then continue driving the poll loop in my thread
 (which in turn would eventually resolve the Future).

 I quite dislike the sync/async enum. While both operations commit
 offsets, their semantics are so different that overloading a single method
 with both is messy. That said, I don't think we should consider this an
 inconsistency wrt the new producer API's use of Future because the two APIs
 have a much more fundamental difference that justifies it: they have
 completely different threading and execution models.

 I think a Future-based API only makes sense if you can guarantee the
 operations that Futures are waiting on will continue to make progress
 regardless of what the thread using the Future does. The producer API makes
 that work by processing asynchronous requests in a background thread. The
 new consumer does not, and so it becomes difficult/impossible to implement
 the Future correctly. (Or, you have to make assumptions which break other
 use cases; if you want to support the simple use case of just making a
 commit() synchronous by calling get(), the Future has to call poll()
 internally; but if you do that, then if any user ever wants to add
 synchronization to the consumer via some external mechanism, then the
 implementation of the Future's get() method will not be subject to that
 synchronization and things will break).

 -Ewen


 Guozhang

 On Tue, Jul 28, 2015 at 4:03 PM, Neha Narkhede n...@confluent.io
 wrote:

 Hey Adi,

 When we designed the initial version, the producer API was still
 changing. I thought about adding the Future and then just didn't get to 
 it.
 I agree that we should look into adding it for consistency.

 Thanks,
 Neha

 On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar 
 aaurad...@linkedin.com wrote:

 Great discussion everyone!

 One general comment on the sync/async API's on the new consumer. I
 think the producer tackles sync vs async API's well. For API's that can
 either be sync or async, can we simply return a future? That seems more
 elegant for the API's that make sense either in both flavors. From the
 users perspective, it is more consistent 

Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Parth Brahmbhatt
+1 on Gwen¹s suggestion.

Consider this as my thank you for all the reviews everyone has done in
past and are going to do in future. Don¹t make me say thanks on every
single commit. Introducing another process when the project has  50 PR
open pretty much all the time is not really going to help.

Thanks
Parth

On 7/29/15, 10:40 AM, Gwen Shapira gshap...@cloudera.com wrote:

My two cents:

The jira comment is a way for the committer to say thank you to
people who were involved in the review process. It doesn't have any
formal implications - the responsibility for committing good code is
on the committer (thats the whole point). It doesn't even have
informal implications - no one ever went after a reviewer if a code
turned out buggy.

I suggest: Leave it up to the committer best judgement and not
introduce process where there's really no need for one.

Gwen

On Wed, Jul 29, 2015 at 6:18 AM, Ismael Juma ism...@juma.me.uk wrote:
 Hi all,

 As a general rule, we credit reviewers in the commit message. This is
good.
 However, it is not clear to me if there are guidelines on who should be
 included as a reviewer (please correct me if I am wrong). I can think
of a
 few options:

1. Anyone that commented on the patch (in the pull request or Review
Board)
2. The ones that have reviewed and approved the patch (+1, LGTM, Ship
it, etc.)
3. A more sophisticated system that differentiates between someone
who
reviews and approves a patch versus someone who simply comments on
aspects
of the patch [1]

 On the surface, `1` seems appealing because it 's simple and credits
people
 who do partial reviews. The issue, however, is that people (including
 myself) may not want to be tagged as a reviewer if they left a comment
or
 two, but didn't review the change properly. Option `2` is still simple
and
 it avoids this issue.

 As such, I lean towards option `2`, although `3` would work for me too
(the
 additional complexity is the main downside).

 Thoughts?

 Best,
 Ismael

 [1] I don't think we should go this far, but the Linux Kernel is an
extreme
 example of this with `Signed-off-by`, `Acked-by`, `Cc`, `Reviewed-by`,
 `Tested-by`, `Suggested-by`, `Reported-by`, `Fixes`, etc. More details
in
 their documentation:
 https://www.kernel.org/doc/Documentation/SubmittingPatches




[jira] [Updated] (KAFKA-2300) Error in controller log when broker tries to rejoin cluster

2015-07-29 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated KAFKA-2300:

Attachment: KAFKA-2300.patch

Uploading a patch for trunk.

 Error in controller log when broker tries to rejoin cluster
 ---

 Key: KAFKA-2300
 URL: https://issues.apache.org/jira/browse/KAFKA-2300
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: Johnny Brown
Assignee: Flavio Junqueira
 Attachments: KAFKA-2300-controller-logs.tar.gz, 
 KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch


 Hello Kafka folks,
 We are having an issue where a broker attempts to join the cluster after 
 being restarted, but is never added to the ISR for its assigned partitions. 
 This is a three-node cluster, and the controller is broker 2.
 When broker 1 starts, we see the following message in broker 2's 
 controller.log.
 {{
 [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error 
 while handling broker changes 
 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
 java.lang.IllegalStateException: Controller to broker state change requests 
 batch is not empty while creating a new one. Some UpdateMetadata state 
 changes Map(2 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  1 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  3 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)))
  might be lost 
   at 
 kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
   at 
 kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
   at 
 kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at kafka.utils.Utils$.inLock(Utils.scala:535)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 }}
 {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of 
 it persists in the controller's memory, causing an exception which interrupts 
 the state change triggered by the broker startup.
 Has anyone seen something like this? Any idea what's happening here? Any 
 information would be greatly appreciated.
 Thanks,
 Johnny



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 28096: Patch for KAFKA-313

2015-07-29 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28096/#review93489
---


Thanks for the patch and sorry for the long delay.


core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (lines 36 - 39)
https://reviews.apache.org/r/28096/#comment147867

Shouldn't this be an inner object? since its only visible and used by 
ConsumerGroupCommand?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (line 93)
https://reviews.apache.org/r/28096/#comment147875

This is defined as breakable, but I don't see where you use break?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (lines 109 - 113)
https://reviews.apache.org/r/28096/#comment147878

Since its a boolean, we want a name that reflects what we check. Perhaps 
iterationsLeft? 

it looks like we want to return true if we want to continue iterating - in 
this case shouldn't a negative numIterations lead to false?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (lines 236 - 241)
https://reviews.apache.org/r/28096/#comment147880

These look identical - copy/paste error?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (line 249)
https://reviews.apache.org/r/28096/#comment147881

I thought none is tab-delimited, not CSV?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (line 363)
https://reviews.apache.org/r/28096/#comment147871

Kafka code base usually doesn't use methods as operators. Why are we doing 
this here?

Also, why define ofTypes here?



core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala (line 388)
https://reviews.apache.org/r/28096/#comment147883

If only CSV and JSON are allowed, what is NONE for?


- Gwen Shapira


On June 24, 2015, 6:14 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28096/
 ---
 
 (Updated June 24, 2015, 6:14 p.m.)
 
 
 Review request for kafka, Gwen Shapira, Jarek Cecho, and Joel Koshy.
 
 
 Bugs: KAFKA-313
 https://issues.apache.org/jira/browse/KAFKA-313
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-313: Add JSON/CSV output and looping options to ConsumerGroupCommand
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala 
 f23120ede5f9bf0cfaf795c65c9845f42d8784d0 
 
 Diff: https://reviews.apache.org/r/28096/diff/
 
 
 Testing
 ---
 
 Ran ConsumerOffsetChecker with different combinations of --output.format and 
 --loop options.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Commented] (KAFKA-2100) Client Error doesn't preserve or display original server error code when it is an unknown code

2015-07-29 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646472#comment-14646472
 ] 

Gwen Shapira commented on KAFKA-2100:
-

+1 and pushed to trunk.

Thank you for your contribution [~dajac], hope to see more :)

 Client Error doesn't preserve or display original server error code when it 
 is an unknown code
 --

 Key: KAFKA-2100
 URL: https://issues.apache.org/jira/browse/KAFKA-2100
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Gwen Shapira
Assignee: David Jacot
  Labels: newbie
 Attachments: KAFKA-2100-1.patch, KAFKA-2100-2.patch


 When the java client receives an unfamiliar error code, it translates it into 
 UNKNOWN(-1, new UnknownServerException(The server experienced an unexpected 
 error when processing the request))
 This completely loses the original code, which makes troubleshooting from the 
 client impossible. 
 Will be better to preserve the original code and write it to the log when 
 logging the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2300: Error in controller log when broke...

2015-07-29 Thread fpj
GitHub user fpj opened a pull request:

https://github.com/apache/kafka/pull/102

KAFKA-2300: Error in controller log when broker tries to rejoin cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fpj/kafka 2300

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #102


commit dbd1bf3a91c3e15ed2d14bf941c41c87b8116608
Author: flavio junqueira f...@apache.org
Date:   2015-07-29T17:07:51Z

KAFKA-2300: Error in controller log when broker tries to rejoin cluster




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Gwen Shapira
My two cents:

The jira comment is a way for the committer to say thank you to
people who were involved in the review process. It doesn't have any
formal implications - the responsibility for committing good code is
on the committer (thats the whole point). It doesn't even have
informal implications - no one ever went after a reviewer if a code
turned out buggy.

I suggest: Leave it up to the committer best judgement and not
introduce process where there's really no need for one.

Gwen

On Wed, Jul 29, 2015 at 6:18 AM, Ismael Juma ism...@juma.me.uk wrote:
 Hi all,

 As a general rule, we credit reviewers in the commit message. This is good.
 However, it is not clear to me if there are guidelines on who should be
 included as a reviewer (please correct me if I am wrong). I can think of a
 few options:

1. Anyone that commented on the patch (in the pull request or Review
Board)
2. The ones that have reviewed and approved the patch (+1, LGTM, Ship
it, etc.)
3. A more sophisticated system that differentiates between someone who
reviews and approves a patch versus someone who simply comments on aspects
of the patch [1]

 On the surface, `1` seems appealing because it 's simple and credits people
 who do partial reviews. The issue, however, is that people (including
 myself) may not want to be tagged as a reviewer if they left a comment or
 two, but didn't review the change properly. Option `2` is still simple and
 it avoids this issue.

 As such, I lean towards option `2`, although `3` would work for me too (the
 additional complexity is the main downside).

 Thoughts?

 Best,
 Ismael

 [1] I don't think we should go this far, but the Linux Kernel is an extreme
 example of this with `Signed-off-by`, `Acked-by`, `Cc`, `Reviewed-by`,
 `Tested-by`, `Suggested-by`, `Reported-by`, `Fixes`, etc. More details in
 their documentation:
 https://www.kernel.org/doc/Documentation/SubmittingPatches


[jira] [Commented] (KAFKA-2300) Error in controller log when broker tries to rejoin cluster

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646514#comment-14646514
 ] 

ASF GitHub Bot commented on KAFKA-2300:
---

GitHub user fpj opened a pull request:

https://github.com/apache/kafka/pull/102

KAFKA-2300: Error in controller log when broker tries to rejoin cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fpj/kafka 2300

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #102


commit dbd1bf3a91c3e15ed2d14bf941c41c87b8116608
Author: flavio junqueira f...@apache.org
Date:   2015-07-29T17:07:51Z

KAFKA-2300: Error in controller log when broker tries to rejoin cluster




 Error in controller log when broker tries to rejoin cluster
 ---

 Key: KAFKA-2300
 URL: https://issues.apache.org/jira/browse/KAFKA-2300
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: Johnny Brown
Assignee: Flavio Junqueira
 Attachments: KAFKA-2300-controller-logs.tar.gz, 
 KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch


 Hello Kafka folks,
 We are having an issue where a broker attempts to join the cluster after 
 being restarted, but is never added to the ISR for its assigned partitions. 
 This is a three-node cluster, and the controller is broker 2.
 When broker 1 starts, we see the following message in broker 2's 
 controller.log.
 {{
 [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error 
 while handling broker changes 
 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
 java.lang.IllegalStateException: Controller to broker state change requests 
 batch is not empty while creating a new one. Some UpdateMetadata state 
 changes Map(2 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  1 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)),
  3 - Map([prod-sver-end,1] - 
 (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)))
  might be lost 
   at 
 kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
   at 
 kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
   at 
 kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
   at kafka.utils.Utils$.inLock(Utils.scala:535)
   at 
 kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
 }}
 {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of 
 it persists in the controller's memory, causing an exception which interrupts 
 the state change triggered by the broker startup.
 Has anyone seen something like this? Any idea what's happening here? Any 
 information would be greatly appreciated.
 Thanks,
 Johnny



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1371) Ignore build output dirs

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646911#comment-14646911
 ] 

ASF GitHub Bot commented on KAFKA-1371:
---

Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/21


 Ignore build output dirs
 

 Key: KAFKA-1371
 URL: https://issues.apache.org/jira/browse/KAFKA-1371
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Trivial
  Labels: git
 Fix For: 0.8.2.0

 Attachments: 0001-Ignore-build-output-dirs.patch


 After a clean clone and project build, build output directories get reported 
 as changes/new. They should be ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-1370 Added Gradle startup script for Win...

2015-07-29 Thread sslavic
Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/22


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-1371 Ignore build output dirs

2015-07-29 Thread sslavic
Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/21


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 36858: Patch for KAFKA-2120

2015-07-29 Thread Mayuresh Gharat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36858/
---

(Updated July 29, 2015, 10:57 p.m.)


Review request for kafka.


Bugs: KAFKA-2120
https://issues.apache.org/jira/browse/KAFKA-2120


Repository: kafka


Description (updated)
---

Solved compile error


Addressed Jason's comments for Kip-19


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/ClientRequest.java 
dc8f0f115bcda893c95d17c0a57be8d14518d034 
  clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java 
2c421f42ed3fc5d61cf9c87a7eaa7bb23e26f63b 
  clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
15d00d4e484bb5d51a9ae6857ed6e024a2cc1820 
  clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
7ab2503794ff3aab39df881bd9fbae6547827d3b 
  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
0e51d7bd461d253f4396a5b6ca7cd391658807fa 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
70377ae2fa46deb381139d28590ce6d4115e1adc 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
923ff999d1b04718ddd9a9132668446525bf62f3 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java
 9517d9d0cd480d5ba1d12f1fde7963e60528d2f8 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
03b8dd23df63a8d8a117f02eabcce4a2d48c44f7 
  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
aa264202f2724907924985a5ecbe74afc4c6c04b 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
 4cb1e50d6c4ed55241aeaef1d3af09def5274103 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 a152bd7697dca55609a9ec4cfe0a82c10595fbc3 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
 06182db1c3a5da85648199b4c0c98b80ea7c6c0c 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
0baf16e55046a2f49f6431e01d52c323c95eddf0 
  clients/src/main/java/org/apache/kafka/common/network/Selector.java 
aaf60c98c2c0f4513a8d65ee0db67953a529d598 
  clients/src/test/java/org/apache/kafka/clients/MockClient.java 
d9c97e966c0e2fb605b67285f4275abb89f8813e 
  clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java 
43238ceaad0322e39802b615bb805b895336a009 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java
 2c693824fa53db1e38766b8c66a0ef42ef9d0f3a 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java
 5b2e4ffaeab7127648db608c179703b27b577414 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/SenderTest.java
 8b1805d3d2bcb9fe2bacb37d870c3236aa9532c4 

Diff: https://reviews.apache.org/r/36858/diff/


Testing
---


Thanks,

Mayuresh Gharat



[GitHub] kafka pull request: Ignore gradle wrapper download directory

2015-07-29 Thread sslavic
Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/67


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-2384) Enable renaming commit message title in kafka-merge-pr.py

2015-07-29 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-2384:


 Summary: Enable renaming commit message title in kafka-merge-pr.py
 Key: KAFKA-2384
 URL: https://issues.apache.org/jira/browse/KAFKA-2384
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Ismael Juma
 Fix For: 0.8.3


It would be more convenient allow setting the commit message in the merging 
script; right now the script takes the PR title as is and the contributors have 
to change them according to the submission-review guidelines before doing the 
merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-1375: Fix formatting in README.md

2015-07-29 Thread sslavic
Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/24


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Reviewers in commit message

2015-07-29 Thread Ismael Juma
On Wed, Jul 29, 2015 at 7:38 PM, Gwen Shapira gshap...@cloudera.com wrote:

 I guess we see the reviewer part with different interpretations.


Yes. As you know, Git was created for and initially used by the Linux
Kernel. As such they were very influential in conventions, terminology and
best practices. This is what their documentation states:

By offering my Reviewed-by: tag, I state that:

  (a) I have carried out a technical review of this patch to
 evaluate its appropriateness and readiness for inclusion into
 the mainline kernel.

 (b) Any problems, concerns, or questions relating to the patch
 have been communicated back to the submitter.  I am satisfied
 with the submitter's response to my comments.

 (c) While there may be things that could be improved with this
 submission, I believe that it is, at this time, (1) a
 worthwhile modification to the kernel, and (2) free of known
 issues which would argue against its inclusion.

 (d) While I have reviewed the patch and believe it to be sound, I
 do not (unless explicitly stated elsewhere) make any
 warranties or guarantees that it will achieve its stated
 purpose or function properly in any given situation.

This is a common interpretation when the word reviewer or reviewed-by is
used in a Git commit. If we mean something else, maybe it's better to use a
different word.

What are the benefits you see of formalizing who gets mentioned as reviewer?


* Consistency
* Easier for new committers if there's a clear guideline
* Avoid surprises for people who comment on PRs. Now that we are accepting
pull requests via GitHub, it is more likely that people will comment on
pull requests as they see it in their news feed (the repository has more
than 300 watchers and this number is likely to rise); many/most of them
won't be doing a proper review.

In any case, if the committers don't think this is an issue, we can
continue as it is and see if anyone else complains. Each community is
different after all.

Best,
Ismael


Re: [DISCUSS] KIP-28 - Add a transform client for data processing

2015-07-29 Thread Neha Narkhede
Prefer something that evokes stream processing on top of Kafka. And since
I've heard many people conflate streaming with streaming video (I know,
duh!), I'd vote for Kafka Streams or a maybe KStream.

Thanks,
Neha

On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps j...@confluent.io wrote:

 Also, the most important part of any prototype, we should have a name for
 this producing-consumer-thingamgigy:

 Various ideas:
 - Kafka Streams
 - KStream
 - Kafka Streaming
 - The Processor API
 - Metamorphosis
 - Transformer API
 - Verwandlung

 For my part I think what people are trying to do is stream processing with
 Kafka so I think something that evokes Kafka and stream processing is
 preferable. I like Kafka Streams or Kafka Streaming followed by KStream.

 Transformer kind of makes me think of the shape-shifting cars.

 Metamorphosis is cool and hilarious but since we are kind of envisioning
 this as more limited scope thing rather than a massive framework in its own
 right I actually think it should have a descriptive name rather than a
 personality of it's own.

 Anyhow let the bikeshedding commence.

 -Jay


 On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang wangg...@gmail.com wrote:

  Hi all,
 
  I just posted KIP-28: Add a transform client for data processing
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
  
  .
 
  The wiki page does not yet have the full design / implementation details,
  and this email is to kick-off the conversation on whether we should add
  this new client with the described motivations, and if yes what features
 /
  functionalities should be included.
 
  Looking forward to your feedback!
 
  -- Guozhang
 




-- 
Thanks,
Neha


[GitHub] kafka pull request: MINOR: Fixed ConsumerRecord constructor javado...

2015-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/85


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: KafkaPreCommit #169

2015-07-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/KafkaPreCommit/169/changes

Changes:

[wangguoz] MINOR: Fixed ConsumerRecord constructor javadoc

--
Started by an SCM change
Building remotely on ubuntu3 (Ubuntu ubuntu legacy-ubuntu) in workspace 
https://builds.apache.org/job/KafkaPreCommit/ws/
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.url 
  https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
  git --version # timeout=10
  git fetch --tags --progress 
  https://git-wip-us.apache.org/repos/asf/kafka.git 
  +refs/heads/*:refs/remotes/origin/*
  git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
  git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 1162cc1dd30e896e8c7a03f960b7d7bcbf883624 
(refs/remotes/origin/trunk)
  git config core.sparsecheckout # timeout=10
  git checkout -f 1162cc1dd30e896e8c7a03f960b7d7bcbf883624
  git rev-list b7bd2978dc3947297fefc06ff9b22949d5bd1b50 # timeout=10
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
[KafkaPreCommit] $ /bin/bash -xe /tmp/hudson2260044051067718206.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.1/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.5
:downloadWrapper UP-TO-DATE

BUILD SUCCESSFUL

Total time: 11.186 secs
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
[KafkaPreCommit] $ /bin/bash -xe /tmp/hudson5984635448923769788.sh
+ ./gradlew -PscalaVersion=2.10.1 test
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4/userguide/gradle_daemon.html.

FAILURE: Build failed with an exception.

* What went wrong:
Could not open buildscript class cache for settings file 
'/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle' 
(/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
 Timeout waiting to lock buildscript class cache for settings file 
 '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle' 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
  It is currently in use by another Gradle instance.
  Owner PID: unknown
  Our PID: 32640
  Owner Operation: unknown
  Our operation: Initialize cache
  Lock file: 
/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript/cache.properties.lock

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 3.049 secs
Build step 'Execute shell' marked build as failure
Setting 
GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1


[jira] [Updated] (KAFKA-2120) Add a request timeout to NetworkClient

2015-07-29 Thread Mayuresh Gharat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayuresh Gharat updated KAFKA-2120:
---
Attachment: KAFKA-2120_2015-07-29_15:57:02.patch

 Add a request timeout to NetworkClient
 --

 Key: KAFKA-2120
 URL: https://issues.apache.org/jira/browse/KAFKA-2120
 Project: Kafka
  Issue Type: New Feature
Reporter: Jiangjie Qin
Assignee: Mayuresh Gharat
 Attachments: KAFKA-2120.patch, KAFKA-2120_2015-07-27_15:31:19.patch, 
 KAFKA-2120_2015-07-29_15:57:02.patch


 Currently NetworkClient does not have a timeout setting for requests. So if 
 no response is received for a request due to reasons such as broker is down, 
 the request will never be completed.
 Request timeout will also be used as implicit timeout for some methods such 
 as KafkaProducer.flush() and kafkaProducer.close().
 KIP-19 is created for this public interface change.
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2120) Add a request timeout to NetworkClient

2015-07-29 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646887#comment-14646887
 ] 

Mayuresh Gharat commented on KAFKA-2120:


Updated reviewboard https://reviews.apache.org/r/36858/diff/
 against branch origin/trunk

 Add a request timeout to NetworkClient
 --

 Key: KAFKA-2120
 URL: https://issues.apache.org/jira/browse/KAFKA-2120
 Project: Kafka
  Issue Type: New Feature
Reporter: Jiangjie Qin
Assignee: Mayuresh Gharat
 Attachments: KAFKA-2120.patch, KAFKA-2120_2015-07-27_15:31:19.patch, 
 KAFKA-2120_2015-07-29_15:57:02.patch


 Currently NetworkClient does not have a timeout setting for requests. So if 
 no response is received for a request due to reasons such as broker is down, 
 the request will never be completed.
 Request timeout will also be used as implicit timeout for some methods such 
 as KafkaProducer.flush() and kafkaProducer.close().
 KIP-19 is created for this public interface change.
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1370) Gradle startup script for Windows

2015-07-29 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962801#comment-13962801
 ] 

Stevo Slavic edited comment on KAFKA-1370 at 7/29/15 11:27 PM:
---

Created pull request with this change (see 
[here|https://github.com/apache/kafka/pull/22])


was (Author: sslavic):
Created pull request with this change (see 
[here|https://github.com/apache/kafka/pull/21])

 Gradle startup script for Windows
 -

 Key: KAFKA-1370
 URL: https://issues.apache.org/jira/browse/KAFKA-1370
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Trivial
  Labels: gradle
 Fix For: 0.8.2.0

 Attachments: 
 0001-KAFKA-1370-Added-Gradle-startup-script-for-Windows.patch


 Please provide Gradle startup script for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1370) Gradle startup script for Windows

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646912#comment-14646912
 ] 

ASF GitHub Bot commented on KAFKA-1370:
---

Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/22


 Gradle startup script for Windows
 -

 Key: KAFKA-1370
 URL: https://issues.apache.org/jira/browse/KAFKA-1370
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Trivial
  Labels: gradle
 Fix For: 0.8.2.0

 Attachments: 
 0001-KAFKA-1370-Added-Gradle-startup-script-for-Windows.patch


 Please provide Gradle startup script for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka Consumer thoughts

2015-07-29 Thread Ismael Juma
Hi Jay,

Good points. A few remarks below.

On Wed, Jul 29, 2015 at 11:16 PM, Jay Kreps j...@confluent.io wrote:

 I suggest we focus on threading and the current event-loop style of api
 design since I think that is really the crux.


Agreed.

I think ultimately though what you need to think about is, does an event
 loop style of API make sense? That is the source of all the issues you
 describe. This style of API is incredibly prevalent from unix select to
 GUIs to node.js. It's a great way to model multiple channels of messages
 coming in. It is a fantastic style for event processing. Programmers
 understand this style of api though I would agree it is unusual compared to
 blocking apis. But it is is a single threaded processing model.


Even though this style of API is prevalent, my experience is that people
often struggle to use it correctly in non-trivial apps. Typically due to
blocking, misuse of threading or callback hell. Perhaps this is because
people try to mix models as you suggest.

If we want to move away from an event loop I'm not sure *any* aspect of the
 current event loop style of api makes sense any more. I am not totally
 married to event loops, but i do think what we have gives an elegant way of
 implementing any higher level abstractions that would fully implement the
 user's parallelism model.


This is a strong reason in favour of the existing approach, I think. An
interesting experiment would be to write an adapter from the new consumer
to a higher-level API like scalaz-stream (
https://github.com/scalaz/scalaz-stream). It would be a positive indicator
if that turns out to be straightforward.

Best,
Ismael


Re: [DISCUSS] KIP-28 - Add a transform client for data processing

2015-07-29 Thread Gwen Shapira
Since it sounds like it is not a separate framework (like CopyCat) but
rather a new client, it will be nice to follow existing convention.
Producer, Consumer and Processor (or Transformer) make sense to me.

Note that the way the API is currently described, people may want to
use it inside Spark applications (because everyone wants to use
Spark), so putting Streaming in the name will cause some confusion.

Metamorphosis and Verwandlung are too perfect to argue against, but
I'd rather keep my slide decks insect-free :)
Since I like fluffy animals on my slides, how about Bunny? This way
chaining multiple processors into a pipeline can be described as Bunny
hops. Also, together with CopyCat, we have pipelines that look like
this: 
http://www.catster.com/wp-content/uploads/2015/06/bd58b829434657a44533a33c32772c36.jpg

Gwen


On Wed, Jul 29, 2015 at 6:46 PM, Sriram Subramanian
srsubraman...@linkedin.com.invalid wrote:
 I think Kafka and streaming are synonymous. Kafka streams or Kafka
 streaming really does not indicate stream processing.

 On Wed, Jul 29, 2015 at 6:20 PM, Neha Narkhede n...@confluent.io wrote:

 Prefer something that evokes stream processing on top of Kafka. And since
 I've heard many people conflate streaming with streaming video (I know,
 duh!), I'd vote for Kafka Streams or a maybe KStream.

 Thanks,
 Neha

 On Wed, Jul 29, 2015 at 6:08 PM, Jay Kreps j...@confluent.io wrote:

  Also, the most important part of any prototype, we should have a name for
  this producing-consumer-thingamgigy:
 
  Various ideas:
  - Kafka Streams
  - KStream
  - Kafka Streaming
  - The Processor API
  - Metamorphosis
  - Transformer API
  - Verwandlung
 
  For my part I think what people are trying to do is stream processing
 with
  Kafka so I think something that evokes Kafka and stream processing is
  preferable. I like Kafka Streams or Kafka Streaming followed by KStream.
 
  Transformer kind of makes me think of the shape-shifting cars.
 
  Metamorphosis is cool and hilarious but since we are kind of envisioning
  this as more limited scope thing rather than a massive framework in its
 own
  right I actually think it should have a descriptive name rather than a
  personality of it's own.
 
  Anyhow let the bikeshedding commence.
 
  -Jay
 
 
  On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang wangg...@gmail.com
 wrote:
 
   Hi all,
  
   I just posted KIP-28: Add a transform client for data processing
   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
   
   .
  
   The wiki page does not yet have the full design / implementation
 details,
   and this email is to kick-off the conversation on whether we should add
   this new client with the described motivations, and if yes what
 features
  /
   functionalities should be included.
  
   Looking forward to your feedback!
  
   -- Guozhang
  
 



 --
 Thanks,
 Neha



Re: Review Request 35421: Patch for KAFKA-2026

2015-07-29 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35421/#review93543
---

Ship it!


Ship It!

- Jiangjie Qin


On June 13, 2015, 10:07 a.m., Manikumar Reddy O wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35421/
 ---
 
 (Updated June 13, 2015, 10:07 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2026
 https://issues.apache.org/jira/browse/KAFKA-2026
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Logging of unused options values taken from this.originals
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java 
 c4fa058692f50abb4f47bd344119d805c60123f5 
 
 Diff: https://reviews.apache.org/r/35421/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Manikumar Reddy O
 




Re: Connection to zk shell on Kafka

2015-07-29 Thread Chris Barlock
I'm a user of Kafka/ZooKeeper not one of its developers, so I can't give 
you a technical explanation.  I do agree that Kafka should ship the jline 
JAR if its zookeeper-shell depends on it.

Chris




From:   Prabhjot Bharaj prabhbha...@gmail.com
To: u...@zookeeper.apache.org, dev@kafka.apache.org
Cc: us...@kafka.apache.org
Date:   07/29/2015 01:27 PM
Subject:Re: Connection to zk shell on Kafka



Sure. It would be great if you could as well explain the reason why the
absence of the jar creates this problem

Also, I'm surprised that zookeeper that comes bundled with kafka 0.8.2 
does
not have the jline jar

Regards,
prabcs

On Wed, Jul 29, 2015 at 10:45 PM, Chris Barlock barl...@us.ibm.com 
wrote:

 You need the jline JAR file that ships with ZooKeeper.

 Chris

 IBM Tivoli Systems
 Research Triangle Park, NC
 (919) 224-2240
 Internet:  barl...@us.ibm.com



 From:   Prabhjot Bharaj prabhbha...@gmail.com
 To: us...@kafka.apache.org, u...@zookeeper.apache.org
 Date:   07/29/2015 01:13 PM
 Subject:Connection to zk shell on Kafka



 Hi folks,

 */kafka/bin# ./zookeeper-shell.sh localhost:2182/*

 *Connecting to localhost:2182/*

 *Welcome to ZooKeeper!*

 *JLine support is disabled*


  *WATCHER::*


  *WatchedEvent state:SyncConnected type:None path:null*


 *The shell never says connected*

 I'm running 5 node zookeeper cluster on 5-node kafka cluster (each kafka
 broker has 1 zookeeper server running)

 When I try connecting to the shell, the shell never says 'Connected'



 However, if I try connecting on another standalone zookeeper  which has 
no
 links to kafka, I'm able to connect:-


 */kafka/bin# /zookeeper/scripts/zkCli.sh -server 127.0.0.1:2181
 http://127.0.0.1:2181*

 *Connecting to 127.0.0.1:2181 http://127.0.0.1:2181*

 *Welcome to ZooKeeper!*

 *JLine support is enabled*


 *WATCHER::*


 *WatchedEvent state:SyncConnected type:None path:null*

 *[zk: 127.0.0.1:2181(CONNECTED) 0]*

 Am I missing something?


 Thanks,

 prabcs




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't



Re: [DISCUSS] Partitioning in Kafka

2015-07-29 Thread Jiangjie Qin
Just my two cents. I think it might be OK to put this into Kafka if we
agree that this might be a good use case for people who wants to use Kafka
as temporary store for stream processing. At very least I don't see down
side on this.

Thanks,

Jiangjie (Becket) Qin

On Tue, Jul 28, 2015 at 3:41 AM, Gianmarco De Francisci Morales 
g...@apache.org wrote:

 Jason,
 Thanks for starting the discussion and for your very concise (and correct)
 summary.

 Ewen, while what you say is true, those kinds of detasets (large number of
 keys with skew) are very typical in the Web (think Twitter users, or Web
 pages, or even just plain text).
 If you want to compute an aggregate on these datasets (either for reporting
 purposes, or as part of some analytical task such as machine learning),
 then the skew will kill your performance, and the amount of parallelism you
 can effectively extract from your dataset.
 PKG is a solution to that, without the full overhead of going to shuffle
 grouping to compute partial aggregates.
 The problem with shuffle grouping is not only the memory, but also the cost
 of combining the aggregates, which increases with the parallelism level.
 Also, by keeping partial aggregates in 2 places, you can query those at
 runtime with constant overhead (similarly to what you would be able to do
 with hashing) rather than needing to broadcast the query to all partitions
 (which you need to do with shuffle grouping).

 --
 Gianmarco

 On 28 July 2015 at 00:54, Gwen Shapira gshap...@cloudera.com wrote:

  I guess it depends on whether the original producer did any map
  tasks or simply wrote raw data. We usually advocate writing raw data,
  and since we need to write it anyway, the partitioner doesn't
  introduce any extra hops.
 
  Its definitely useful to look at use-cases and I need to think a bit
  more on whether huge-key-space-with-large-skew is the only one.
  I think that there are use-cases that are not pure-aggregate and
  therefore keeping key-list in memory won't help and scaling to large
  number of partitions is still required (and therefore skew is a
  critical problem). However, I may be making stuff up, so need to
  double check.
 
  Gwen
 
 
 
 
 
  On Mon, Jul 27, 2015 at 2:20 PM, Ewen Cheslack-Postava
  e...@confluent.io wrote:
   Gwen - this is really like two steps of map reduce though, right? The
  first
   step does the partial shuffle to two partitions per key, second step
 does
   partial reduce + final full shuffle, final step does the final reduce.
  
   This strikes me as similar to partition assignment strategies in the
   consumer in that there will probably be a small handful of commonly
 used
   strategies that we can just maintain as part of Kafka. A few people
 will
   need more obscure strategies and they can maintain those
 implementations
   themselves. For reference, a quick grep of Spark shows 5 partitioners:
  Hash
   and RangePartitioner, which are in core, PythonPartitioner,
  GridPartitioner
   for partitioning matrices, and ShuffleRowRDD for their SQL
  implementation.
   So I don't think it would be a big deal to include it here, although
 I'm
   not really sure how often it's useful -- compared to normal
 partitioning
  or
   just doing two steps by starting with unpartitioned data, you need to
 be
   performing an aggregation, the key set needs to be large enough for
  memory
   usage to be a problem (i.e. you don't want each consumer to have to
   maintain a map with every key in it), and a sufficiently skewed
   distribution (i.e. not just 1 or 2 very hot keys). The key set
  constraint,
   in particular, is the one I'm not convinced by since in practice if you
   have a skewed distribution, you probably also won't actually see every
  key
   in every partition; each worker actually only needs to maintain a
 subset
  of
   the key set (and associated aggregate data) in memory.
  
  
   On Mon, Jul 27, 2015 at 12:56 PM, Gwen Shapira gshap...@cloudera.com
   wrote:
  
   If you are used to map-reduce patterns, this sounds like a perfectly
   natural way to process streams of data.
  
   Call the first consumer map-combine-log, the topic shuffle-log and
   the second consumer reduce-log :)
   I like that a lot. It works well for either embarrassingly parallel
   cases, or so much data that more parallelism is worth the extra
   overhead cases.
  
   I personally don't care if its in core-Kafka, KIP-28 or a github
   project elsewhere, but I find it useful and non-esoteric.
  
  
  
   On Mon, Jul 27, 2015 at 12:51 PM, Jason Gustafson ja...@confluent.io
 
   wrote:
For a little background, the difference between this partitioner and
  the
default one is that it breaks the deterministic mapping from key to
partition. Instead, messages for a given key can end up in either of
  two
partitions. This means that the consumer generally won't see all
  messages
for a given key. Instead the consumer would compute an aggregate for
  each
key on 

[jira] [Commented] (KAFKA-2203) Get gradle build to work with Java 8

2015-07-29 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645734#comment-14645734
 ] 

Ismael Juma commented on KAFKA-2203:


[~sslavic], does it work for you now that we no longer build against Scala 2.9?

 Get gradle build to work with Java 8
 

 Key: KAFKA-2203
 URL: https://issues.apache.org/jira/browse/KAFKA-2203
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.1.1
Reporter: Gaju Bhat
Priority: Minor
 Fix For: 0.8.1.2

 Attachments: 0001-Special-case-java-8-and-javadoc-handling.patch


 The gradle build halts because javadoc in java 8 is a lot stricter about 
 valid html.
 It might be worthwhile to special case java 8 as described 
 [here|http://blog.joda.org/2014/02/turning-off-doclint-in-jdk-8-javadoc.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2276) Initial patch for KIP-25

2015-07-29 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2276:
---
Fix Version/s: (was: 0.9.0)
   0.8.3

 Initial patch for KIP-25
 

 Key: KAFKA-2276
 URL: https://issues.apache.org/jira/browse/KAFKA-2276
 Project: Kafka
  Issue Type: Bug
Reporter: Geoffrey Anderson
Assignee: Geoffrey Anderson
 Fix For: 0.8.3


 Submit initial patch for KIP-25 
 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-25+-+System+test+improvements)
 This patch should contain a few Service classes and a few tests which can 
 serve as examples 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Build failed in Jenkins: KafkaPreCommit #167

2015-07-29 Thread Ismael Juma
It looks like two builds are running in the same workspace at the same time
somehow.

Ismael

On Wed, Jul 29, 2015 at 1:44 AM, Guozhang Wang wangg...@gmail.com wrote:

 Anyone knows how this Could not open buildscript class issue could happen
 and can we fix it on our side or it is a general jenkins issue?

 Guozhang

 On Tue, Jul 28, 2015 at 5:39 PM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:

  See https://builds.apache.org/job/KafkaPreCommit/167/changes
 
  Changes:
 
  [wangguoz] KAFKA-2276; KIP-25 initial patch
 
  --
  Started by an SCM change
  Building remotely on ubuntu3 (Ubuntu ubuntu legacy-ubuntu) in workspace 
  https://builds.apache.org/job/KafkaPreCommit/ws/
git rev-parse --is-inside-work-tree # timeout=10
  Fetching changes from the remote Git repository
git config remote.origin.url
  https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
  Fetching upstream changes from
  https://git-wip-us.apache.org/repos/asf/kafka.git
git --version # timeout=10
git fetch --tags --progress
  https://git-wip-us.apache.org/repos/asf/kafka.git
  +refs/heads/*:refs/remotes/origin/*
git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
  Checking out Revision e43c9aff92c57da6abb0c1d0af3431a550110a89
  (refs/remotes/origin/trunk)
git config core.sparsecheckout # timeout=10
git checkout -f e43c9aff92c57da6abb0c1d0af3431a550110a89
git rev-list f4101ab3fcf7ec65f6541b157f1894ffdc8d861d # timeout=10
  Setting
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
  [KafkaPreCommit] $ /bin/bash -xe /tmp/hudson7546159244531236383.sh
  +
 
 /home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1/bin/gradle
  To honour the JVM settings for this build a new JVM will be forked.
 Please
  consider using the daemon:
  http://gradle.org/docs/2.1/userguide/gradle_daemon.html.
  Building project 'core' with Scala version 2.10.5
  :downloadWrapper UP-TO-DATE
 
  BUILD SUCCESSFUL
 
  Total time: 16.059 secs
  Setting
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
  [KafkaPreCommit] $ /bin/bash -xe /tmp/hudson4914211160587558932.sh
  + ./gradlew -PscalaVersion=2.10.1 test
  To honour the JVM settings for this build a new JVM will be forked.
 Please
  consider using the daemon:
  http://gradle.org/docs/2.4/userguide/gradle_daemon.html.
 
  FAILURE: Build failed with an exception.
 
  * What went wrong:
  Could not open buildscript class cache for settings file
  '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle'
 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
   Timeout waiting to lock buildscript class cache for settings file
  '/x1/jenkins/jenkins-slave/workspace/KafkaPreCommit/settings.gradle'
 
 (/home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript).
  It is currently in use by another Gradle instance.
Owner PID: unknown
Our PID: 23737
Owner Operation: unknown
Our operation: Initialize cache
Lock file:
 
 /home/jenkins/.gradle/caches/2.4/scripts/settings_azpqlzkn71yz2ostwvxkk46wf/SettingsScript/buildscript/cache.properties.lock
 
  * Try:
  Run with --stacktrace option to get the stack trace. Run with --info or
  --debug option to get more log output.
 
  BUILD FAILED
 
  Total time: 1 mins 3.207 secs
  Build step 'Execute shell' marked build as failure
  Setting
 
 GRADLE_2_1_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.1
 



 --
 -- Guozhang



[jira] [Commented] (KAFKA-1913) App hungs when calls producer.send to wrong IP of Kafka broker

2015-07-29 Thread Igor Khomenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645918#comment-14645918
 ] 

Igor Khomenko commented on KAFKA-1913:
--

For now I have to use the following utils to check the url 

{code}
public static boolean isReachable(String hostname, int port, int timeout) {
if(hostname == null) {
return false;
} else {
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Checking host:  + hostname + , port:  + 
port + , timeout:  + timeout);
}

InetSocketAddress sockaddr = new InetSocketAddress(hostname, port);
Socket socket = new Socket();
boolean online = true;

try {
socket.connect(sockaddr, timeout);
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is ON);
}
} catch (IOException var15) {
online = false;
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is not 
reachable);
}
} finally {
try {
socket.close();
} catch (IOException var14) {
;
}

}

return online;
}
}
{code} 

 App hungs when calls producer.send to wrong IP of Kafka broker
 --

 Key: KAFKA-1913
 URL: https://issues.apache.org/jira/browse/KAFKA-1913
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.1.1
 Environment: OS X 10.10.1, Java 7, AWS Linux
Reporter: Igor Khomenko
Assignee: Jun Rao
 Fix For: 0.8.3


 I have next test code to check the Kafka functionality:
 {code}
 package com.company;
 import kafka.common.FailedToSendMessageException;
 import kafka.javaapi.producer.Producer;
 import kafka.producer.KeyedMessage;
 import kafka.producer.ProducerConfig;
 import java.util.Date;
 import java.util.Properties;
 public class Main {
 public static void main(String[] args) {
 Properties props = new Properties();
 props.put(metadata.broker.list, 192.168.9.3:9092);
 props.put(serializer.class, com.company.KafkaMessageSerializer);
 props.put(request.required.acks, 1);
 ProducerConfig config = new ProducerConfig(props);
 // The first is the type of the Partition key, the second the type of 
 the message.
 ProducerString, String messagesProducer = new ProducerString, 
 String(config);
 // Send
 String topicName = my_messages;
 String message = hello world;
 KeyedMessageString, String data = new KeyedMessageString, 
 String(topicName, message);
 try {
 System.out.println(new Date() + : sending...);
 messagesProducer.send(data);
 System.out.println(new Date() +  : sent);
 }catch (FailedToSendMessageException e){
 System.out.println(e:  + e);
 e.printStackTrace();
 }catch (Exception exc){
 System.out.println(e:  + exc);
 exc.printStackTrace();
 }
 }
 }
 {code}
 {code}
 package com.company;
 import kafka.serializer.Encoder;
 import kafka.utils.VerifiableProperties;
 /**
  * Created by igorkhomenko on 2/2/15.
  */
 public class KafkaMessageSerializer implements EncoderString {
 public KafkaMessageSerializer(VerifiableProperties verifiableProperties) {
 /* This constructor must be present for successful compile. */
 }
 @Override
 public byte[] toBytes(String entity) {
 byte [] serializedMessage = doCustomSerialization(entity);
 return serializedMessage;
 }
 private byte[] doCustomSerialization(String entity) {
 return entity.getBytes();
 }
 }
 {code}
 Here is also GitHub version https://github.com/soulfly/Kafka-java-producer
 So it just hungs on next line:
 {code}
 messagesProducer.send(data)
 {code}
 When I replaced the brokerlist to
 {code}
 props.put(metadata.broker.list, localhost:9092);
 {code}
 then I got an exception:
 {code}
 kafka.common.FailedToSendMessageException: Failed to send messages after 3 
 tries.
 {code}
 so it's okay
 Why it hungs with wrong brokerlist? Any ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1913) App hungs when calls producer.send to wrong IP of Kafka broker

2015-07-29 Thread Igor Khomenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645918#comment-14645918
 ] 

Igor Khomenko edited comment on KAFKA-1913 at 7/29/15 11:53 AM:


For now I have to use the following utils to check the url before use it in 
kafka

{code}
public static boolean isReachable(String hostname, int port, int timeout) {
if(hostname == null) {
return false;
} else {
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Checking host:  + hostname + , port:  + 
port + , timeout:  + timeout);
}

InetSocketAddress sockaddr = new InetSocketAddress(hostname, port);
Socket socket = new Socket();
boolean online = true;

try {
socket.connect(sockaddr, timeout);
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is ON);
}
} catch (IOException var15) {
online = false;
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is not 
reachable);
}
} finally {
try {
socket.close();
} catch (IOException var14) {
;
}

}

return online;
}
}
{code} 


was (Author: igor.quickblox):
For now I have to use the following utils to check the url 

{code}
public static boolean isReachable(String hostname, int port, int timeout) {
if(hostname == null) {
return false;
} else {
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Checking host:  + hostname + , port:  + 
port + , timeout:  + timeout);
}

InetSocketAddress sockaddr = new InetSocketAddress(hostname, port);
Socket socket = new Socket();
boolean online = true;

try {
socket.connect(sockaddr, timeout);
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is ON);
}
} catch (IOException var15) {
online = false;
if(log.isLoggable(Level.FINE)) {
log.log(Level.FINE, Host \' + hostname + \' is not 
reachable);
}
} finally {
try {
socket.close();
} catch (IOException var14) {
;
}

}

return online;
}
}
{code} 

 App hungs when calls producer.send to wrong IP of Kafka broker
 --

 Key: KAFKA-1913
 URL: https://issues.apache.org/jira/browse/KAFKA-1913
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.1.1
 Environment: OS X 10.10.1, Java 7, AWS Linux
Reporter: Igor Khomenko
Assignee: Jun Rao
 Fix For: 0.8.3


 I have next test code to check the Kafka functionality:
 {code}
 package com.company;
 import kafka.common.FailedToSendMessageException;
 import kafka.javaapi.producer.Producer;
 import kafka.producer.KeyedMessage;
 import kafka.producer.ProducerConfig;
 import java.util.Date;
 import java.util.Properties;
 public class Main {
 public static void main(String[] args) {
 Properties props = new Properties();
 props.put(metadata.broker.list, 192.168.9.3:9092);
 props.put(serializer.class, com.company.KafkaMessageSerializer);
 props.put(request.required.acks, 1);
 ProducerConfig config = new ProducerConfig(props);
 // The first is the type of the Partition key, the second the type of 
 the message.
 ProducerString, String messagesProducer = new ProducerString, 
 String(config);
 // Send
 String topicName = my_messages;
 String message = hello world;
 KeyedMessageString, String data = new KeyedMessageString, 
 String(topicName, message);
 try {
 System.out.println(new Date() + : sending...);
 messagesProducer.send(data);
 System.out.println(new Date() +  : sent);
 }catch (FailedToSendMessageException e){
 System.out.println(e:  + e);
 e.printStackTrace();
 }catch (Exception exc){
 System.out.println(e:  + exc);
 exc.printStackTrace();
 }
 }
 }
 {code}
 {code}
 package com.company;
 import kafka.serializer.Encoder;
 import kafka.utils.VerifiableProperties;
 /**
  * Created by igorkhomenko on 2/2/15.
  */
 public class KafkaMessageSerializer implements EncoderString {
 

[GitHub] kafka pull request: Fixed ConsumerRecord constructor javadoc

2015-07-29 Thread sslavic
GitHub user sslavic reopened a pull request:

https://github.com/apache/kafka/pull/85

Fixed ConsumerRecord constructor javadoc

Refactoring of ConsumerRecord made in 
https://github.com/apache/kafka/commit/0699ff2ce60abb466cab5315977a224f1a70a4da#diff-fafe8d3a3942f3c6394927881a9389b2
 left ConsumerRecord constructor javadoc inconsistent with implementation.

This patch fixes ConsumerRecord constructor javadoc to be inline with 
implementation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sslavic/kafka patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/85.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #85


commit fd304452a001b417237b4b67e4bd3f70574f46be
Author: Stevo Slavić ssla...@gmail.com
Date:   2015-07-18T12:01:13Z

Fixed ConsumerRecord constructor javadoc

Refactoring of ConsumerRecord made in 
https://github.com/apache/kafka/commit/0699ff2ce60abb466cab5315977a224f1a70a4da#diff-fafe8d3a3942f3c6394927881a9389b2
 left ConsumerRecord constructor javadoc inconsistent with implementation.

This patch fixes ConsumerRecord constructor javadoc to be inline with 
implementation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: Fixed ConsumerRecord constructor javadoc

2015-07-29 Thread sslavic
Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/85


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] Reviewers in commit message

2015-07-29 Thread Ismael Juma
Hi all,

As a general rule, we credit reviewers in the commit message. This is good.
However, it is not clear to me if there are guidelines on who should be
included as a reviewer (please correct me if I am wrong). I can think of a
few options:

   1. Anyone that commented on the patch (in the pull request or Review
   Board)
   2. The ones that have reviewed and approved the patch (+1, LGTM, Ship
   it, etc.)
   3. A more sophisticated system that differentiates between someone who
   reviews and approves a patch versus someone who simply comments on aspects
   of the patch [1]

On the surface, `1` seems appealing because it 's simple and credits people
who do partial reviews. The issue, however, is that people (including
myself) may not want to be tagged as a reviewer if they left a comment or
two, but didn't review the change properly. Option `2` is still simple and
it avoids this issue.

As such, I lean towards option `2`, although `3` would work for me too (the
additional complexity is the main downside).

Thoughts?

Best,
Ismael

[1] I don't think we should go this far, but the Linux Kernel is an extreme
example of this with `Signed-off-by`, `Acked-by`, `Cc`, `Reviewed-by`,
`Tested-by`, `Suggested-by`, `Reported-by`, `Fixes`, etc. More details in
their documentation:
https://www.kernel.org/doc/Documentation/SubmittingPatches


Re: Review Request 36858: Patch for KAFKA-2120

2015-07-29 Thread Mayuresh Gharat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36858/
---

(Updated July 29, 2015, 10:58 p.m.)


Review request for kafka.


Bugs: KAFKA-2120
https://issues.apache.org/jira/browse/KAFKA-2120


Repository: kafka


Description (updated)
---

Patch for Kip-19 : Added RequestTimeOut and MaxBlockTimeOut

Solved compile error


Addressed Jason's comments for Kip-19


Diffs
-

  clients/src/main/java/org/apache/kafka/clients/ClientRequest.java 
dc8f0f115bcda893c95d17c0a57be8d14518d034 
  clients/src/main/java/org/apache/kafka/clients/CommonClientConfigs.java 
2c421f42ed3fc5d61cf9c87a7eaa7bb23e26f63b 
  clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
15d00d4e484bb5d51a9ae6857ed6e024a2cc1820 
  clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
7ab2503794ff3aab39df881bd9fbae6547827d3b 
  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
0e51d7bd461d253f4396a5b6ca7cd391658807fa 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
70377ae2fa46deb381139d28590ce6d4115e1adc 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
923ff999d1b04718ddd9a9132668446525bf62f3 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java
 9517d9d0cd480d5ba1d12f1fde7963e60528d2f8 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
03b8dd23df63a8d8a117f02eabcce4a2d48c44f7 
  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
aa264202f2724907924985a5ecbe74afc4c6c04b 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
 4cb1e50d6c4ed55241aeaef1d3af09def5274103 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 a152bd7697dca55609a9ec4cfe0a82c10595fbc3 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
 06182db1c3a5da85648199b4c0c98b80ea7c6c0c 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
0baf16e55046a2f49f6431e01d52c323c95eddf0 
  clients/src/main/java/org/apache/kafka/common/network/Selector.java 
aaf60c98c2c0f4513a8d65ee0db67953a529d598 
  clients/src/test/java/org/apache/kafka/clients/MockClient.java 
d9c97e966c0e2fb605b67285f4275abb89f8813e 
  clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java 
43238ceaad0322e39802b615bb805b895336a009 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java
 2c693824fa53db1e38766b8c66a0ef42ef9d0f3a 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java
 5b2e4ffaeab7127648db608c179703b27b577414 
  
clients/src/test/java/org/apache/kafka/clients/producer/internals/SenderTest.java
 8b1805d3d2bcb9fe2bacb37d870c3236aa9532c4 

Diff: https://reviews.apache.org/r/36858/diff/


Testing
---


Thanks,

Mayuresh Gharat



[jira] [Updated] (KAFKA-2384) Enable renaming commit message title in kafka-merge-pr.py

2015-07-29 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2384:
---
Issue Type: Improvement  (was: Bug)

 Enable renaming commit message title in kafka-merge-pr.py
 -

 Key: KAFKA-2384
 URL: https://issues.apache.org/jira/browse/KAFKA-2384
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
Assignee: Ismael Juma
 Fix For: 0.8.3


 It would be more convenient allow setting the commit message in the merging 
 script; right now the script takes the PR title as is and the contributors 
 have to change them according to the submission-review guidelines before 
 doing the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1375) Formatting for Running a task on a particular version of Scala paragraph in README.md is broken

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646907#comment-14646907
 ] 

ASF GitHub Bot commented on KAFKA-1375:
---

Github user sslavic closed the pull request at:

https://github.com/apache/kafka/pull/24


 Formatting for Running a task on a particular version of Scala paragraph in 
 README.md is broken
 -

 Key: KAFKA-1375
 URL: https://issues.apache.org/jira/browse/KAFKA-1375
 Project: Kafka
  Issue Type: Bug
  Components: website
Affects Versions: 0.8.1
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Trivial
  Labels: documentation
 Fix For: 0.8.2.0

 Attachments: 
 0001-KAFKA-1375-Fixed-formatting-of-instructions-for-usin.patch


 See commit which broke formatting at 
 https://github.com/apache/kafka/commit/879e3e770ebc49f916137e8416df74373fa26a74#diff-04c6e90faac2675aa89e2176d2eec7d8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-28 - Add a transform client for data processing

2015-07-29 Thread Jay Kreps
Also, the most important part of any prototype, we should have a name for
this producing-consumer-thingamgigy:

Various ideas:
- Kafka Streams
- KStream
- Kafka Streaming
- The Processor API
- Metamorphosis
- Transformer API
- Verwandlung

For my part I think what people are trying to do is stream processing with
Kafka so I think something that evokes Kafka and stream processing is
preferable. I like Kafka Streams or Kafka Streaming followed by KStream.

Transformer kind of makes me think of the shape-shifting cars.

Metamorphosis is cool and hilarious but since we are kind of envisioning
this as more limited scope thing rather than a massive framework in its own
right I actually think it should have a descriptive name rather than a
personality of it's own.

Anyhow let the bikeshedding commence.

-Jay


On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang wangg...@gmail.com wrote:

 Hi all,

 I just posted KIP-28: Add a transform client for data processing
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
 
 .

 The wiki page does not yet have the full design / implementation details,
 and this email is to kick-off the conversation on whether we should add
 this new client with the described motivations, and if yes what features /
 functionalities should be included.

 Looking forward to your feedback!

 -- Guozhang



Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
Thanks for the comments Jason and Jay.

Jason, I had the same concern for producer's callback as well before, but
it seems to be fine from some callbacks I wrote - user can always pass in
object in the constructor if necessary for synchronization.

Jay, I agree that the current API might be fine for people who wants to
wrap it up. But I thought the new consumer was supposed to be a combination
of old high and low level consumer, which means it should be able to be
used as is, just like producer. If KafkaConsumer is designed to be wrapped
up for use, then the question becomes whether Kafka will provide a decent
wrapper or not? Neha mentioned that KIP-28 will address the users who only
care about data. Would that be the wrapper provided by Kafka? I am not sure
if that is sufficient though because the processor is highly abstracted,
and might only meet the static data stream requirement as I listed in the
grid. For users who need something from the other grids, are we going to
have another wrapper? Or are we expecting all the user to write their own
wrapper for KafkaConsumer? Some other comments are in line.

Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 29, 2015 at 3:16 PM, Jay Kreps j...@confluent.io wrote:

 Some comments on the proposal:

 I think we are conflating a number of things that should probably be
 addressed individually because they are unrelated. My past experience is
 that this always makes progress hard. The more we can pick apart these
 items the better:

1. threading model
2. blocking vs non-blocking semantics
3. missing apis
4. missing javadoc and other api surprises
5. Throwing exceptions.

 The missing APIs are getting added independently. Some like your proposed
 offsetByTime where things we agreed to hold off on for the first release
 and do when we'd thought it through. If there are uses for it now we can
 accelerate. I think each of these is really independent, we know there are
 things that need to be added but lumping them all into one discussion will
 be confusing.

 WRT throwing exceptions the policy is to throw exceptions that are
 unrecoverable and handle and log other exceptions that are transient. That
 policy makes sense if you go through the thought exercise of what will the
 user do if i throw this exception to them if they have no other rational
 response but to retry (and if failing to anticipate and retry with that
 exception will kill their program) . You can argue whether the topic not
 existing is transient or not, unfortunately the way we did auto-creation
 makes it transient if you are in auto create mode and non-transient
 otherwise (ick!). In any case this is an orthogonal discussion to
 everything else. I think the policy is right and if we don't conform to it
 in some way that is really an independent bug/discussion.

Agreed we can discuss about them separately.


 I suggest we focus on threading and the current event-loop style of api
 design since I think that is really the crux.

 The analogy between the producer threading model and the consumer model
 actually doesn't work for me. The goal of the producer is actually to take
 requests from many many user threads and shove them into a single buffer
 for batching. So the threading model isn't the 1:1 threads you describe it
 is N:1.The goal of the consumer is to support single-threaded processing.
 This is what drives the difference. Saying that the producer has N:1
 threads therefore for the consumer should have 1:1 threads instead of just
 1 thread doesn't make sense any more then an analogy to the brokers
 threading model would--the problem we're solving is totally different.

I think the ultimate goal for producer and consumer are still allowing user
to send/receive data in parallel. In producer we picked the solution of
one-producer-serving-multiple-threads, and in consumer we picked
multiple-single-threaded-consumers instead of
single-consumer-serving-multiple threads. And we believe people can always
implement the latter with the former. I think this is a reasonable
decision. However, there are also reasonable concerns over the
multiple-single-threaded-consumers solution which is that the single-thread
might have to be a dedicate polling thread in many cases which pushes user
towards the other solution - i.e. implementing a
single-thread-consumer-serving-multiple-threads wrapper. From what we hear,
it seems to be a quite common concern for most of the users we talked to.
Plus the adoption bar of the consumer will be much higher because user will
have to understand some of the details of the things they don't care as
listed in the grid.
The analogy between producer/consumer is intended to show that a separate
polling thread will solve the concerns we have.



I think ultimately though what you need to think about is, does an event
 loop style of API make sense? That is the source of all the issues you
 describe. This style of API is incredibly prevalent from unix select to
 GUIs to 

[GitHub] kafka pull request: MINOR - Fix typo in ReplicaVerificationTool ou...

2015-07-29 Thread ottomata
GitHub user ottomata opened a pull request:

https://github.com/apache/kafka/pull/101

MINOR - Fix typo in ReplicaVerificationTool output



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ottomata/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #101


commit 10b76f370e532a865c64182eab00ad1429990ef6
Author: Andrew Otto aco...@gmail.com
Date:   2015-07-29T15:03:45Z

MINOR - Fix typo in ReplicaVerificationTool output




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2383) kafka-env should be separated between client and server

2015-07-29 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646069#comment-14646069
 ] 

Sriharsha Chintalapani commented on KAFKA-2383:
---

[~AdamWesterman] kafka-env.sh is not merged int apache yet 
https://issues.apache.org/jira/browse/KAFKA-1566 . You might want to add your 
comment in there.

 kafka-env should be separated between client and server
 ---

 Key: KAFKA-2383
 URL: https://issues.apache.org/jira/browse/KAFKA-2383
 Project: Kafka
  Issue Type: Improvement
  Components: config
Affects Versions: 0.8.2.1
 Environment: HDP 2.2.4.2
Reporter: Adam Westerman
Priority: Minor

 The environment variables set in the kafka-env template take effect in both 
 client and server.  One instance in which this causes problems is when 
 enabling JMX by setting JMX_PORT in the kafka-env template.  The broker will 
 start successfully, but then when trying to use a client script such as 
 kafka-topics.sh it attempts to rebind to that port, throwing an exception and 
 preventing the client script from running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)