[jira] [Created] (KAFKA-184) Log retention size and file size should be a long

2011-11-01 Thread Joel Koshy (Created) (JIRA)
Log retention size and file size should be a long
-

 Key: KAFKA-184
 URL: https://issues.apache.org/jira/browse/KAFKA-184
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Priority: Minor
 Fix For: 0.8


Realized this in a local set up: the log.retention.size config option should be 
a long, or we're limited to 2GB. Also, the name can be improved to 
log.retention.size.bytes or Mbytes as appropriate. Same comments for 
log.file.size. If we rename the configs, it would be better to resolve 
KAFKA-181 first.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-183) Expose offset vector to the consumer

2011-11-01 Thread Jay Kreps (Created) (JIRA)
Expose offset vector to the consumer


 Key: KAFKA-183
 URL: https://issues.apache.org/jira/browse/KAFKA-183
 Project: Kafka
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps


We should enable consumers to save their position themselves. This would be 
useful for consumers that need to store consumed data so they can store the 
data and the position together, this gives a poor man's "transactionality" 
since any data loss on the consumer will also rewind the position to the 
previous position so the two are always in sync.

Two ways to do this:
1. Add an OffsetStorage interface and have the zk storage implement this. The 
user can override this by providing an OffsetStorage implementation of their 
own to change how values are stored.
2. Make commit() return the position offset vector and add a 
setPosition(List) method to initialize the position.

Let's figure out any potential problems with this, and work out the best 
approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-182) Set a TCP connection timeout for the SimpleConsumer

2011-11-01 Thread Jay Kreps (Created) (JIRA)
Set a TCP connection timeout for the SimpleConsumer
---

 Key: KAFKA-182
 URL: https://issues.apache.org/jira/browse/KAFKA-182
 Project: Kafka
  Issue Type: Bug
Reporter: Jay Kreps


Currently we use SocketChannel.open which I *think* can block for a long time. 
We should make this configurable, and we may have to create the socket in a 
different way to enable this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-181) Log errors for unrecognized config options

2011-11-01 Thread Jay Kreps (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141517#comment-13141517
 ] 

Jay Kreps commented on KAFKA-181:
-

Yes, please yes.

I recommend we create a Config object that wraps java.util.Properties. It 
should include all the random Utils helpers we have for parsing ints and stuff. 
Whenever a get() is called for a property string we should record that property 
in a set. We can add a method that intersects the requested properties with the 
provided properties to get unused properties.

This config can be used in KafkaConfig and other configs.

As a side note, there are many places where we need to be able let the user 
provide pluggins that implement an interface. Examples are the EventHandler and 
Serializer interfaces in the producer, and you could imagine us making other 
things such as offset storage pluggable. One requirement to make this work is 
that it needs to be possible for the user to set properties for their plugin. 
For example to create an AvroSerializer you need to be able to pass in a 
schema.registry.url parameter which needs to get passed through unmolested to 
the AvroSerializerImpl to use. To enable the config objects like KafkaConfig 
that parse out their options should retain the original Config instance. The 
general contract for pluggins should be that they must provide a constructor 
that takes a Config so that these configs can be passed through.

> Log errors for unrecognized config options
> --
>
> Key: KAFKA-181
> URL: https://issues.apache.org/jira/browse/KAFKA-181
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Joel Koshy
> Fix For: 0.8
>
>
> Currently, unrecognized config options are silently ignored. Notably, if a 
> config has a typo or if a deprecated config is used, then there is no warning 
> issued and defaults are assumed. One can argue that the broker or a consumer 
> or a producer with an unrecognized config option should not even be allowed 
> to start up especially if defaults are silently assumed, but it would be good 
> to at least log an error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-181) Log errors for unrecognized config options

2011-11-01 Thread Joel Koshy (Created) (JIRA)
Log errors for unrecognized config options
--

 Key: KAFKA-181
 URL: https://issues.apache.org/jira/browse/KAFKA-181
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Joel Koshy
 Fix For: 0.8


Currently, unrecognized config options are silently ignored. Notably, if a 
config has a typo or if a deprecated config is used, then there is no warning 
issued and defaults are assumed. One can argue that the broker or a consumer or 
a producer with an unrecognized config option should not even be allowed to 
start up especially if defaults are silently assumed, but it would be good to 
at least log an error.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: KAFKA-50 replication support and the Disruptor

2011-11-01 Thread Erik van Oosten
There are several wait strategies. You will want to use spin lock in 
production environments where you should have enough CPU cores anyway. 
Remember, the 'real' work runs in another always running thread that 
also uses a spin lock to wait for more work.
In dev environment or hosts that need to do lots of other stuff, you 
definitely need another wait strategy.


Erik.


Op 31-10-11 21:38, Chris Burroughs wrote:

On 10/31/2011 04:23 AM, Erik van Oosten wrote:

That is not the point (mostly). While you're waiting for a lock, you can't 
issue another IO request. Avoiding locking is worthwhile even if CPU is the 
bottleneck. The advantage is that you'll get lower latency and also important, 
less jitter.

/begin{Tangent}

Doesn't the Disruptor use a spin lock though?  I would expect that to
not play nice if sharing a core with CPU bound threads doing 'real' work.



--
Erik van Oosten
http://www.day-to-day-stuff.blogspot.com/



[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-11-01 Thread Neha Narkhede (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141374#comment-13141374
 ] 

Neha Narkhede commented on KAFKA-171:
-

You can check it into trunk. 0.7 is going off its own branch

> Kafka producer should do a single write to send message sets
> 
>
> Key: KAFKA-171
> URL: https://issues.apache.org/jira/browse/KAFKA-171
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7, 0.8
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8
>
> Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, 
> KAFKA-171.patch
>
>
> From email thread: 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
> > Before sending an actual message, kafka producer do send a (control) 
> > message of 4 bytes to the server. Kafka producer always does this action 
> > before send some message to the server.
> I think this is because in BoundedByteBufferSend.scala we do essentially
>  channel.write(sizeBuffer)
>  channel.write(dataBuffer)
> The correct solution is to use vector I/O and instead do
>  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-11-01 Thread Jay Kreps (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141295#comment-13141295
 ] 

Jay Kreps commented on KAFKA-171:
-

Cool, will clean up imports before checking in. I am going to hold off on this 
until after 0.7 goes out.

> Kafka producer should do a single write to send message sets
> 
>
> Key: KAFKA-171
> URL: https://issues.apache.org/jira/browse/KAFKA-171
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7, 0.8
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8
>
> Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, 
> KAFKA-171.patch
>
>
> From email thread: 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
> > Before sending an actual message, kafka producer do send a (control) 
> > message of 4 bytes to the server. Kafka producer always does this action 
> > before send some message to the server.
> I think this is because in BoundedByteBufferSend.scala we do essentially
>  channel.write(sizeBuffer)
>  channel.write(dataBuffer)
> The correct solution is to use vector I/O and instead do
>  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-11-01 Thread Jun Rao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141272#comment-13141272
 ] 

Jun Rao commented on KAFKA-171:
---

MessageSet has a couple of unused imports. Other than that, the patch looks 
good. 

> Kafka producer should do a single write to send message sets
> 
>
> Key: KAFKA-171
> URL: https://issues.apache.org/jira/browse/KAFKA-171
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7, 0.8
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8
>
> Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, 
> KAFKA-171.patch
>
>
> From email thread: 
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
> > Before sending an actual message, kafka producer do send a (control) 
> > message of 4 bytes to the server. Kafka producer always does this action 
> > before send some message to the server.
> I think this is because in BoundedByteBufferSend.scala we do essentially
>  channel.write(sizeBuffer)
>  channel.write(dataBuffer)
> The correct solution is to use vector I/O and instead do
>  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-180) Clean up shell scripts

2011-11-01 Thread Jun Rao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141257#comment-13141257
 ] 

Jun Rao commented on KAFKA-180:
---

SimpleConsumeShell is still useful for debugging purpose. I'd like to keep the 
code. The script can go.

> Clean up shell scripts
> --
>
> Key: KAFKA-180
> URL: https://issues.apache.org/jira/browse/KAFKA-180
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
>
> Currently it is a bit of a mess:
> jkreps-mn:kafka-git jkreps$ ls bin
> kafka-console-consumer-log4j.properties   kafka-producer-perf-test.sh 
> kafka-server-stop.shzookeeper-server-stop.sh
> kafka-console-consumer.sh kafka-producer-shell.sh 
> kafka-simple-consumer-perf-test.sh  zookeeper-shell.sh
> kafka-console-producer.sh kafka-replay-log-producer.sh
> kafka-simple-consumer-shell.sh
> kafka-consumer-perf-test.sh   kafka-run-class.sh  
> run-rat.sh
> kafka-consumer-shell.sh   kafka-server-start.sh   
> zookeeper-server-start.sh
> I think all the *-shell.sh scripts and all the *-simple-perf-test.sh scripts 
> should die. If anyone has a use for these test classes we can keep them 
> around and use the via kafka-run-class, but they are clearly not made for 
> normal people to use. The *-shell.sh scripts are obsolete now that we have 
> the *-console-*.sh scripts, since these do everything the old scripts did and 
> more. I recommend we also delete the code for these.
> I would like to change each tool so that it produces a usage line explaining 
> what it does when run without arguments. Currently I actually had to go read 
> the code to figure out what some of these are.
> I would like to clean up places where the arguments are non-standard. 
> Argument names should be the same across all the tools.
> I would also like to rename kafka-replay-log-producer.sh to 
> kafka-copy-topic.sh. I think this tool should also accept two zookeeper urls, 
> the url of the input cluster and the url of the output cluster so this tool 
> can be used to copy between clusters. I think we can have a --zookeeper a 
> --input-zookeeper and a --output-zookeeper where --zookeeper is equivalent to 
> setting both the input and the output zookeeper. Also confused why the 
> options for this list --brokerinfo which can be either a zk url or brokerlist 
> AND also --zookeeper which must be a zk url.
> Any objections to all this? Any other gripes people have while I am in there?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira