[jira] [Updated] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-30 Thread Jay Kreps (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-171:


Attachment: KAFKA-171.patch

Okay this patch completes the conversion to GatheringByteChannel. I recommend 
we take this even though there doesn't seem to be real perf difference just 
because the network profile is better.

> Kafka producer should do a single write to send message sets
> 
>
> Key: KAFKA-171
> URL: https://issues.apache.org/jira/browse/KAFKA-171
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7, 0.8
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8
>
> Attachments: KAFKA-171-draft.patch, KAFKA-171.patch
>
>
> From email thread: 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
> > Before sending an actual message, kafka producer do send a (control) 
> > message of 4 bytes to the server. Kafka producer always does this action 
> > before send some message to the server.
> I think this is because in BoundedByteBufferSend.scala we do essentially
>  channel.write(sizeBuffer)
>  channel.write(dataBuffer)
> The correct solution is to use vector I/O and instead do
>  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-178) Simplify the brokerinfo argument to the producer-perf-test.sh

2011-10-30 Thread Jay Kreps (Created) (JIRA)
Simplify the brokerinfo argument to the producer-perf-test.sh
-

 Key: KAFKA-178
 URL: https://issues.apache.org/jira/browse/KAFKA-178
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Jay Kreps


Currently:
jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh 
Missing required argument "[brokerinfo]"
Option  Description
--  ---
...
--brokerinfo   
...

This is kind of confusing and doesn't match the other scripts. I would like to 
change it to 
  --zookeeper zk_connect_string
  --broker-list id1:host1:port1,id2:host2:port2,...
and require that one of these be specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-48) Implement optional "long poll" support in fetch request

2011-10-30 Thread Jay Kreps (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-48:
---

Attachment: KAFKA-48-socket-server-refactor-draft.patch

This is a draft patch that refactors the socket server to make requests and 
responses asynchronous. No need for a detailed review, it still needs a lot of 
cleanup, but I wanted to show people the idea in more detail.

> Implement optional "long poll" support in fetch request
> ---
>
> Key: KAFKA-48
> URL: https://issues.apache.org/jira/browse/KAFKA-48
> Project: Kafka
>  Issue Type: Bug
>Reporter: Alan Cabrera
> Attachments: KAFKA-48-socket-server-refactor-draft.patch
>
>
> Currently, the fetch request is non-blocking. If there is nothing on the 
> broker for the consumer to retrieve, the broker simply returns an empty set 
> to the consumer. This can be inefficient, if you want to ensure low-latency 
> because you keep polling over and over. We should make a blocking version of 
> the fetch request so that the fetch request is not returned until the broker 
> has at least one message for the fetcher or some timeout passes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (KAFKA-48) Implement optional "long poll" support in fetch request

2011-10-30 Thread Jay Kreps (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps reassigned KAFKA-48:
--

Assignee: Jay Kreps

> Implement optional "long poll" support in fetch request
> ---
>
> Key: KAFKA-48
> URL: https://issues.apache.org/jira/browse/KAFKA-48
> Project: Kafka
>  Issue Type: Bug
>Reporter: Alan Cabrera
>Assignee: Jay Kreps
> Attachments: KAFKA-48-socket-server-refactor-draft.patch
>
>
> Currently, the fetch request is non-blocking. If there is nothing on the 
> broker for the consumer to retrieve, the broker simply returns an empty set 
> to the consumer. This can be inefficient, if you want to ensure low-latency 
> because you keep polling over and over. We should make a blocking version of 
> the fetch request so that the fetch request is not returned until the broker 
> has at least one message for the fetcher or some timeout passes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-176) Fix existing perf tools

2011-10-30 Thread Neha Narkhede (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139845#comment-13139845
 ] 

Neha Narkhede commented on KAFKA-176:
-

1. I think we can add a command line option --header that will control the 
display of the header

2. Good suggestion. I think what you are saying is that aggregate stats should 
be the default, and instead of having --aggregate option, we should have 
--showDetailedStats option. I think that is a good idea. 

3. OK. I'll take a stab at that, and upload a new patch.

4. KAFKA-175 will need to be updated as well. It would be great if people can 
look at that too.


> Fix existing perf tools
> ---
>
> Key: KAFKA-176
> URL: https://issues.apache.org/jira/browse/KAFKA-176
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Fix For: 0.8
>
> Attachments: kafka-176.patch
>
>
> The existing perf tools - ProducerPerformance.scala, 
> ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
> buggy. It will be good to -
> 1. move them to a perf directory from the existing kafka/tools location
> 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-176) Fix existing perf tools

2011-10-30 Thread Jun Rao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139844#comment-13139844
 ] 

Jun Rao commented on KAFKA-176:
---

1. It can be in the code. Maybe we just need a command line option to control 
whether to display header or not?

2. I am suggesting that we alway show aggregated output. So there won't be a 
-aggregate option. Instead, have a --showDetails option to enable/disable 
detailed stats in each thread, probably with default set to false.

3. Some system tests can use a producer that generates variable-sized random 
messages. Instead of having another producer tool, it would be good if we can 
just allow such option in this tool.


> Fix existing perf tools
> ---
>
> Key: KAFKA-176
> URL: https://issues.apache.org/jira/browse/KAFKA-176
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Fix For: 0.8
>
> Attachments: kafka-176.patch
>
>
> The existing perf tools - ProducerPerformance.scala, 
> ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
> buggy. It will be good to -
> 1. move them to a perf directory from the existing kafka/tools location
> 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-176) Fix existing perf tools

2011-10-30 Thread Neha Narkhede (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139814#comment-13139814
 ] 

Neha Narkhede commented on KAFKA-176:
-

This is a draft patch, so things that I wasn't sure off, are left incomplete

1. The header being in debug logging is clearly not ideal. There are 2 ways of 
exposing the header - 
a. In the code
b. In the scripts

Since some graphing tools might need the headers in a certain format, I 
initially thought its better left in the scripts (see KAFKA-175). However, 
since the code controls the display of the data, it might as well be left there.

2. The --avg option already controls that. We could probably rename that to 
-aggregate instead ?

3.1 & 3.2 The reason I haven't touched the variable size message path is 
because I don't think it is well implemented. When selecting that option, the 
throughput actually reduces due to the way it is implemented. Not sure having a 
variable message size option actually helps conclude anything in this perf test 
?

4. Good point. 

I will update the patch, once we have a better idea about the questions above.



> Fix existing perf tools
> ---
>
> Key: KAFKA-176
> URL: https://issues.apache.org/jira/browse/KAFKA-176
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Fix For: 0.8
>
> Attachments: kafka-176.patch
>
>
> The existing perf tools - ProducerPerformance.scala, 
> ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
> buggy. It will be good to -
> 1. move them to a perf directory from the existing kafka/tools location
> 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-91) zkclient does not show up in pom

2011-10-30 Thread Neha Narkhede (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-91:
---

   Resolution: Fixed
Fix Version/s: (was: 0.8)
   0.7
   Status: Resolved  (was: Patch Available)

> zkclient does not show up in pom
> 
>
> Key: KAFKA-91
> URL: https://issues.apache.org/jira/browse/KAFKA-91
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Chris Burroughs
>Assignee: Chris Burroughs
>Priority: Minor
> Fix For: 0.7
>
> Attachments: k91-v1.txt, k91-v2.txt
>
>
> The pom from created by `make-pom`. Does not include zkclient, which is  of 
> course a key dependency.  Not sure yet how to pull in zkclient while 
> excluding sbt itself.
> $ cat core/target/scala_2.8.0/kafka-0.7.pom  | grep -i zkclient | wc -l
> 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-91) zkclient does not show up in pom

2011-10-30 Thread Neha Narkhede (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139786#comment-13139786
 ] 

Neha Narkhede commented on KAFKA-91:


+1. Just committed this, but realized you could've done it yourself.

> zkclient does not show up in pom
> 
>
> Key: KAFKA-91
> URL: https://issues.apache.org/jira/browse/KAFKA-91
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Chris Burroughs
>Assignee: Chris Burroughs
>Priority: Minor
> Fix For: 0.8
>
> Attachments: k91-v1.txt, k91-v2.txt
>
>
> The pom from created by `make-pom`. Does not include zkclient, which is  of 
> course a key dependency.  Not sure yet how to pull in zkclient while 
> excluding sbt itself.
> $ cat core/target/scala_2.8.0/kafka-0.7.pom  | grep -i zkclient | wc -l
> 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-177) Remove the clojure client until it is correctly implemented and refactored

2011-10-30 Thread Neha Narkhede (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-177:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Just committed this.

> Remove the clojure client until it is correctly implemented and refactored
> --
>
> Key: KAFKA-177
> URL: https://issues.apache.org/jira/browse/KAFKA-177
> Project: Kafka
>  Issue Type: Task
>  Components: clients
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
> Fix For: 0.7
>
> Attachments: kafka-177.patch
>
>
> The related conversation on the mailing list is here - 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/ajax/%3CCAFbh0Q0xMBWXK0gnKZyKkwwUBjoyxJ=qcr1hu9v55qh7de8...@mail.gmail.com%3E
> The clojure client can be brought back when it is in better shape and 
> ownership.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: KAFKA-50 replication support and the Disruptor

2011-10-30 Thread Jay Kreps
This is interesting. But wouldn't the cost of the I/O here (writing to log,
requests to slave nodes) completely dominate the cost of locks?

-Jay

On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten wrote:

> Hello,
>
> The upcoming replication support (which we eagerly anticipate at my work)
> is a feature for which LMAX' disruptor is an ideal solution (
> http://code.google.com/p/**disruptor/,
> Apache licensed). A colleague has in fact just started on a new replicating
> message broker based on it 
> (https://github.com/cdegroot/**underground
> ).
>
> The disruptor itself is a super-performing in-jvm consumer/producer
> system. A consumer normally works in its own thread. The disruptor gets
> most of its speed because it is designed such that each consumer can
> continue working without releasing the CPU to the OS or other threads. In
> addition it is optimized for modern CPU architectures, for example by
> respecting the way the CPU cache works, and by avoiding all locking, CAS
> operations and even by keeping volatile read/writes to a minimum.
> Consumers may depend on work of other consumers. The disruptor will only
> offer new messages (in bulk if possible) when they were processed by
> preceding consumers.
>
> For Kafka-50 we can (for example) think of the following tasks:
> -a- get incoming new messages (the producer)
> -b- pre-processor (calculate checksum and offset)
> -c- write to journal
> -d- write to replica broker, wait for confirmation
> -e- notify consumers (no changes here)
>
> With the disrupter the main flow would be coded as:
>
> disruptor
> .handleEventsWith(**preprocessor)
> .then(journaller, replicator)
> .then(notifier)
>
> Journaling and replicating of a message is thus executed in parallel.
>
> When this approach is considered, feel free to ask me about the disruptor.
> Hopefully I will also find some time to write some code a well.
>
> Kind regards,
>Erik.
>
>
> --
> Erik van Oosten
> http://www.day-to-day-stuff.**blogspot.com/
>
>
>


KAFKA-50 replication support and the Disruptor

2011-10-30 Thread Erik van Oosten

Hello,

The upcoming replication support (which we eagerly anticipate at my 
work) is a feature for which LMAX' disruptor is an ideal solution 
(http://code.google.com/p/disruptor/, Apache licensed). A colleague has 
in fact just started on a new replicating message broker based on it 
(https://github.com/cdegroot/underground).


The disruptor itself is a super-performing in-jvm consumer/producer 
system. A consumer normally works in its own thread. The disruptor gets 
most of its speed because it is designed such that each consumer can 
continue working without releasing the CPU to the OS or other threads. 
In addition it is optimized for modern CPU architectures, for example by 
respecting the way the CPU cache works, and by avoiding all locking, CAS 
operations and even by keeping volatile read/writes to a minimum.
Consumers may depend on work of other consumers. The disruptor will only 
offer new messages (in bulk if possible) when they were processed by 
preceding consumers.


For Kafka-50 we can (for example) think of the following tasks:
-a- get incoming new messages (the producer)
-b- pre-processor (calculate checksum and offset)
-c- write to journal
-d- write to replica broker, wait for confirmation
-e- notify consumers (no changes here)

With the disrupter the main flow would be coded as:

disruptor
.handleEventsWith(preprocessor)
.then(journaller, replicator)
.then(notifier)

Journaling and replicating of a message is thus executed in parallel.

When this approach is considered, feel free to ask me about the 
disruptor. Hopefully I will also find some time to write some code a well.


Kind regards,
Erik.


--
Erik van Oosten
http://www.day-to-day-stuff.blogspot.com/