Re: java api code and javadoc

2014-10-16 Thread Ewen Cheslack-Postava
KafkaStream and MessageAndOffset are Scala classes, so you'll find them under the scaladocs. The ConsumerConnector interface should show up in the javadocs with good documentation coverage. Some classes like MessageAndOffset are so simple (just compositions of other data) that they aren't going to

Re: Too much log for kafka.common.KafkaException

2014-10-18 Thread Ewen Cheslack-Postava
This looks very similar to the error and stacktrace I see when reproducing https://issues.apache.org/jira/browse/KAFKA-1196 -- that's an overflow where the data returned in a FetchResponse exceeds 2GB. (It triggers the error you're seeing because FetchResponse's size overflows to become negative,

Re: Does adding ConsumerTimeoutException make the code more robust?

2014-12-01 Thread Ewen Cheslack-Postava
No, hasNext will return immediately if data is available. The consumer timeout is only helpful if your application can't safely block on the iterator indefinitely. -Ewen On Sat, Nov 29, 2014, at 08:35 PM, Rahul Amaram wrote: Yes, I have configured consumer timeout config. Let me put my query

Re: Very slow producer

2014-12-11 Thread Ewen Cheslack-Postava
Did you set producer.type to async when creating your producer? The console producer uses async by default, but the default producer config is sync. -Ewen On Thu, Dec 11, 2014 at 6:08 AM, Huy Le Van huy.le...@insight-centre.org wrote: Hi, I’m writing my own producer to read from text files,

Re: Kafka System test

2015-01-23 Thread Ewen Cheslack-Postava
1. Except for that hostname setting being a list instead of a single host, the changes look reasonable. That is where you want to customize settings for your setup. 2 3. Yes, you'll want to update those files as well. They top-level ones provide defaults, the ones in specific test directories

Re: Kafka System test

2015-01-23 Thread Ewen Cheslack-Postava
here. How do I change? thanks AL On Fri, Jan 23, 2015 at 1:22 PM, Ewen Cheslack-Postava e...@confluent.io wrote: 1. Except for that hostname setting being a list instead of a single host, the changes look reasonable. That is where you want to customize settings for your

Re: How to run Kafka Producer in Java environment? How to set mainClass in pom file in EC2 instance?

2015-01-20 Thread Ewen Cheslack-Postava
You should only need jar.with.dependencies.jar -- maven-assembly-plugin's jar-with-dependencies mode collects all your code and project dependencies into one jar file. It looks like the problem is that your mainclass is set to only 'HelloKafkaProducer'. You need to specify the full name

Re: Poor performance running performance test

2015-01-28 Thread Ewen Cheslack-Postava
Cheslack-Postava e...@confluent.io wrote: Where are you running ProducerPerformance in relation to ZK and the Kafka brokers? You should definitely see much higher performance than this. A couple of other things I can think of that might be going wrong: Are all your VMs in the same AZ

Re: Poor performance running performance test

2015-01-27 Thread Ewen Cheslack-Postava
Where are you running ProducerPerformance in relation to ZK and the Kafka brokers? You should definitely see much higher performance than this. A couple of other things I can think of that might be going wrong: Are all your VMs in the same AZ? Are you storing Kafka data in EBS or local ephemeral

Re: SimpleConsumer leaks sockets on an UnresolvedAddressException

2015-01-27 Thread Ewen Cheslack-Postava
This was fixed in commit 6ab9b1ecd8 for KAFKA-1235 and it looks like that will only be included in 0.8.2. Guozhang, it looks like you wrote the patch, Jun reviewed it, but the bug is still open and there's a comment that moved it to 0.9 after the commit was already made. Was the commit a mistake

Re: Get replication and partition count of a topic

2015-01-12 Thread Ewen Cheslack-Postava
I think the closest thing to what you want is ZkUtils.getPartitionsForTopics, which returns a list of partition IDs for each topic you specify. -Ewen On Mon, Jan 12, 2015 at 12:55 AM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi, kafka-topics.sh script can be used to retrieve topic

Re: New 0.8.2 client compatibility with 0.8.1.1 during failure cases

2015-01-06 Thread Ewen Cheslack-Postava
Paul, That behavior is currently expected, see https://issues.apache.org/jira/browse/KAFKA-1788. There are currently no client-side timeouts in the new producer, so the message just sits there forever waiting for the server to come back so it can try to send it. If you already have tests for a

Re: schema.registry.url = null

2015-03-17 Thread Ewen Cheslack-Postava
Clint, Your code looks fine and the output doesn't actually have any errors, but you're also not waiting for the messages to be published. Try changing producer.send(data); to producer.send(data).get(); to wait block until the message has been acked. If it runs and exits cleanly, then you

Re: High Level Consumer Example in 0.8.2

2015-03-11 Thread Ewen Cheslack-Postava
That example still works, the high level consumer interface hasn't changed. There is a new high level consumer on the way and an initial version has been checked into trunk, but it won't be ready to use until 0.9. On Wed, Mar 11, 2015 at 9:05 AM, ankit tyagi ankittyagi.mn...@gmail.com wrote:

Re: Possible to count for unclosed resources in process

2015-03-06 Thread Ewen Cheslack-Postava
You could also take a thread dump to try to find them by their network threads. For example this is how new producer network threads are named: String ioThreadName = kafka-producer-network-thread + (clientId.length() 0 ? | + clientId : ); On Fri, Mar 6, 2015 at 1:04 PM, Gwen Shapira

Re: Which node do you send data to?

2015-03-06 Thread Ewen Cheslack-Postava
Spencer, Kafka (and it's clients) handle failover automatically for you. When you create a topic, you can select a replication factor. For a replication factor n, each partition of the topic will be replicated to n different brokers. At any given time, one of those brokers is considered the

Re: High Level Consumer Example in 0.8.2

2015-03-12 Thread Ewen Cheslack-Postava
)* As I see package of producer is different in both the jar, so there won't be any conflicts . On Thu, Mar 12, 2015 at 10:51 AM, Ewen Cheslack-Postava e...@confluent.io wrote: Ah, I see the confusion now. The kafka-clients jar was introduced only recently and is meant to hold

Re: High Level Consumer Example in 0.8.2

2015-03-11 Thread Ewen Cheslack-Postava
Cheslack-Postava e...@confluent.io wrote: That example still works, the high level consumer interface hasn't changed. There is a new high level consumer on the way and an initial version has been checked into trunk, but it won't be ready to use until 0.9. On Wed, Mar 11, 2015

Re: Message routing, Kafka-to-REST and HTTP API tools/frameworks for Kafka?

2015-03-25 Thread Ewen Cheslack-Postava
For 3, Confluent wrote a REST proxy that's pretty comprehensive. See the docs: http://confluent.io/docs/current/kafka-rest/docs/intro.html and a blog post describing it + future directions: http://blog.confluent.io/2015/03/25/a-comprehensive-open-source-rest-proxy-for-kafka/ There are a few other

Re: schemaregistry example

2015-03-31 Thread Ewen Cheslack-Postava
The name for the int type in Avro is int not integer. Your command should work if you change field2's type. -Ewen On Tue, Mar 31, 2015 at 1:51 AM, Clint Mcneil clintmcn...@gmail.com wrote: Hi guys When trying the example schema in

Re: How Query Topic For Metadata

2015-02-27 Thread Ewen Cheslack-Postava
You might want ZkUtils.getPartitionsForTopic. But beware that it's an internal method that could potentially change or disappear in the future. If you're just looking for protocol-level solutions, the metadata API has a request that will return info about the number of partitions:

Re: REST/Proxy Consumer access

2015-03-05 Thread Ewen Cheslack-Postava
Yes, Confluent built a REST proxy that gives access to cluster metadata (e.g. list topics, leaders for partitions, etc), producer (send binary or Avro messages to any topic), and consumer (run a consumer instance and consume messages from a topic). And you are correct, internally it uses Jetty and

Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-01 Thread Ewen Cheslack-Postava
On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder jeffschroe...@computer.org wrote: Kafka on dedicated hosts running in docker under marathon under Mesos. It was a real bear to get working, but is really beautiful once I did manage to get it working. I simply run with a unique hostname

Re: Broker w/ high memory due to index file sizes

2015-02-22 Thread Ewen Cheslack-Postava
If you haven't seen it yet, you probably want to look at http://kafka.apache.org/documentation.html#java -Ewen On Thu, Feb 19, 2015 at 10:53 AM, Zakee kzak...@netzero.net wrote: Well are there any measurement techniques for Memory config in brokers. We do have a large load, with a max

Re: File as message's content

2015-02-26 Thread Ewen Cheslack-Postava
Kafka can accept any type of data, you just pass a byte[] to the producer and get a byte[] back from the consumer. How you interpret it is entirely up to your application. But it does have limits on message size (see the message.max.bytes and replica.fetch.max.bytes setting for brokers) and

Re: producer queue size

2015-03-18 Thread Ewen Cheslack-Postava
The setting you want is buffer.memory, but I don't think there's a way to get the amount of remaining space. The setting block.on.buffer.full controls the behavior when you run out of space. Neither setting silently drops messages. It will either block until there is space to add the message or

Re: Hung Kafka Threads?

2015-04-13 Thread Ewen Cheslack-Postava
Parking to wait for just means the thread has been put to sleep while waiting for some synchronized resource. In this case, ConditionObject indicates it's probably await()ing on a condition variable. This almost always means that thread is just waiting for notification from another thread that

Re: serveral questions about auto.offset.reset

2015-04-13 Thread Ewen Cheslack-Postava
On Mon, Apr 13, 2015 at 10:10 PM, bit1...@163.com bit1...@163.com wrote: Hi, Kafka experts: I got serveral questions about auto.offset.reset. This configuration parameter governs how consumer read the message from Kafka when there is no initial offset in ZooKeeper or if an offset is out of

Re: Kafka 0.8.2 beta - release

2015-04-30 Thread Ewen Cheslack-Postava
are not supported in KafkaConsumer. Do you know when they will be supported? public OffsetMetadata commit(MapTopicPartition, Long offsets, boolean sync) { throw new UnsupportedOperationException(); } Thanks Regards, On Wed, Apr 29, 2015 at 10:52 PM, Ewen Cheslack-Postava e

Re: New producer: metadata update problem on 2 Node cluster.

2015-04-27 Thread Ewen Cheslack-Postava
Maybe add this to the description of https://issues.apache.org/jira/browse/KAFKA-1843 ? I can't find it now, but I think there was another bug where I described a similar problem -- in some cases it makes sense to fall back to the list of bootstrap nodes because you've gotten into a bad state and

Re: Kafka 0.8.2 beta - release

2015-04-29 Thread Ewen Cheslack-Postava
It has already been released, including a minor revision to fix some critical bugs. The latest release is 0.8.2.1. The downloads page has links and release notes: http://kafka.apache.org/downloads.html On Wed, Apr 29, 2015 at 10:22 PM, Gomathivinayagam Muthuvinayagam sankarm...@gmail.com wrote:

Re: New producer: metadata update problem on 2 Node cluster.

2015-04-28 Thread Ewen Cheslack-Postava
Ok, all of that makes sense. The only way to possibly recover from that state is either for K2 to come back up allowing the metadata refresh to eventually succeed or to eventually try some other node in the cluster. Reusing the bootstrap nodes is one possibility. Another would be for the client to

Re: New Producer API - batched sync mode support

2015-04-27 Thread Ewen Cheslack-Postava
A couple of thoughts: 1. @Joel I agree it's not hard to use the new API but it definitely is more verbose. If that snippet of code is being written across hundreds of projects, that probably means we're missing an important API. Right now I've only seen the one complaint, but it's worth finding

Re: New producer: metadata update problem on 2 Node cluster.

2015-05-05 Thread Ewen Cheslack-Postava
Cheslack-Postava e...@confluent.io wrote: Ok, all of that makes sense. The only way to possibly recover from that state is either for K2 to come back up allowing the metadata refresh to eventually succeed or to eventually try some other node in the cluster. Reusing

Re: New producer: metadata update problem on 2 Node cluster.

2015-05-07 Thread Ewen Cheslack-Postava
a new consumer instance *does not* solve this problem. Attaching the producer/consumer code that I used for testing. On Wed, May 6, 2015 at 6:31 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I'm not sure about the old producer behavior in this same failure scenario, but creating

Re: Kafka Client in Rust

2015-05-10 Thread Ewen Cheslack-Postava
Added to the wiki, which required adding a new Rust section :) Thanks for the contribution, Yousuf! On Sun, May 10, 2015 at 6:57 PM, Yousuf Fauzan yousuffau...@gmail.com wrote: Hi All, I have create Kafka client for Rust. The client supports Metadata, Produce, Fetch, and Offset requests. I

Re: Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

2015-05-10 Thread Ewen Cheslack-Postava
@Gwen- But that only works for topics that have low enough traffic that you would ever actually hit that timeout. The Confluent schema registry needs to do something similar to make sure it has fully consumed the topic it stores data in so it doesn't serve stale data. We know in our case we'll

Re: Kafka broker and producer max message default config

2015-05-12 Thread Ewen Cheslack-Postava
The max.request.size effectively caps the largest size message the producer will send, but the actual purpose is, as the name implies, to limit the size of a request, which could potentially include many messages. This keeps the producer from sending very large requests to the broker. The

Re: New Producer API Design

2015-05-13 Thread Ewen Cheslack-Postava
You can of course use KafkaProducerObject, Object to get a producer interface that can accept a variety of types. For example, if you have an Avro serializer that accepts both primitive types (e.g. String, integer types) and complex types (e.g. records, arrays, maps), Object is the only type you

Re: [DISCUSS] KIP-14 Tools Standardization

2015-04-10 Thread Ewen Cheslack-Postava
Command line tools are definitely public interfaces. They should get the same treatment as any other public interface like the APIs or protocols. Improving and standardizing them is the right thing to do, but compatibility is still important. Changes should really come with a well-documented

Re: KafkaConsumer poll always returns null

2015-05-19 Thread Ewen Cheslack-Postava
The new consumer in trunk is functional when used similarly to the old SimpleConsumer, but none of the functionality corresponding to the high level consumer is there yet (broker-based coordination for consumer groups). There's not a specific timeline for the next release (i.e. when it's ready).

Re: KafkaConsumer poll always returns null

2015-05-20 Thread Ewen Cheslack-Postava
-Postava - do you have an example you could post? From: Ewen Cheslack-Postava [e...@confluent.io] Sent: Tuesday, May 19, 2015 3:12 PM To: users@kafka.apache.org Subject: Re: KafkaConsumer poll always returns null The new consumer in trunk is functional

Re: offsets.storage=kafka, dual.commit.enabled=false still requires ZK

2015-06-09 Thread Ewen Cheslack-Postava
The new consumer implementation, which should be included in 0.8.3, only needs a bootstrap.servers setting and does not use a zookeeper connection. On Tue, Jun 9, 2015 at 1:26 PM, noah iamn...@gmail.com wrote: We are setting up a new Kafka project (0.8.2.1) and are trying to go straight to

Re: NoSuchMethodError with Consumer Instantiation

2015-06-18 Thread Ewen Cheslack-Postava
It looks like you have mixed up versions of the kafka jars: 4. kafka_2.11-0.8.3-SNAPSHOT.jar 5. kafka_2.11-0.8.2.1.jar 6. kafka-clients-0.8.2.1.jar I think org.apache.kafka.common.utils.Utils is very new, probably post 0.8.2.1, so it's probably caused by the kafka_2.11-0.8.3-SNAPSHOT.jar being

Re: No key specified when sending the message to Kafka

2015-06-23 Thread Ewen Cheslack-Postava
It does balance data, but is sticky over short periods of time (for some definition of short...). See this FAQ for an explanation: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? This behavior has been

Re: Consumer rebalancing based on partition sizes?

2015-06-23 Thread Ewen Cheslack-Postava
Current partition assignment only has a few limited options -- see the partition.assignment.strategy consumer option (which seems to be listed twice, see the second version for a more detailed explanation). There has been some discussion of making assignment strategies user extensible to support

Re: Kafka as an event store for Event Sourcing

2015-06-13 Thread Ewen Cheslack-Postava
that could be rebuilt from the log or a snapshot. On lør. 13. jun. 2015 at 20.26 Ewen Cheslack-Postava e...@confluent.io wrote: Jay - I think you need broker support if you want CAS to work with compacted topics. With the approach you described you can't turn on compaction since that would make

Re: Kafka as an event store for Event Sourcing

2015-06-13 Thread Ewen Cheslack-Postava
: But wouldn't the key-offset table be enough to accept or reject a write? I'm not familiar with the exact implementation of Kafka, so I may be wrong. On lør. 13. jun. 2015 at 21.05 Ewen Cheslack-Postava e...@confluent.io wrote: Daniel: By random read, I meant not reading the data

Re: Kafka as an event store for Event Sourcing

2015-06-13 Thread Ewen Cheslack-Postava
Jay - I think you need broker support if you want CAS to work with compacted topics. With the approach you described you can't turn on compaction since that would make it last-writer-wins, and using any non-infinite retention policy would require some external process to monitor keys that might

Re: Batch producer latencies and flush()

2015-06-28 Thread Ewen Cheslack-Postava
The logic you're requesting is basically what the new producer implements. The first condition is the batch size limit and the second is linger.ms. The actual logic is a bit more complicated and has some caveats dealing with, for example, backing off after failures, but you can see in this code

Re: How Producer handles Network Connectivity Issues

2015-05-26 Thread Ewen Cheslack-Postava
It's not being switched in this case because the broker hasn't failed. It can still connect to all the other brokers and zookeeper. The only failure is of the link between a client and the broker. Another way to think of this is to extend the scenario with more producers. If I have 100 other

Re: bootstrap.servers for the new Producer

2015-08-21 Thread Ewen Cheslack-Postava
Are you seeing this in practice or is this just a concern about the way the code currently works? If the broker is actually down and the host is rejecting connections, the situation you describe shouldn't be a problem. It's true that the NetworkClient chooses a fixed nodeIndexOffset, but the

Re: bootstrap.servers for the new Producer

2015-08-22 Thread Ewen Cheslack-Postava
initiated (and as long as the dns entry is there) it only fails to connect in the poll() method. And in the poll() method the status is not reset to DISCONNECTED and so it not blacked out. On Fri, Aug 21, 2015 at 10:06 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Are you seeing

Re: Cache Memory Kafka Process

2015-07-28 Thread Ewen Cheslack-Postava
stops reading at times. When i do a free -m on my broker node after 1/2 - 1 hr the memory foot print is as follows. 1) Physical memory - 500 MB - 600 MB 2) Cache Memory - 6.5 GB 3) Free Memory - 50 - 60 MB Regards, Nilesh Chhapru. On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava

Re: Cache Memory Kafka Process

2015-07-29 Thread Ewen Cheslack-Postava
, Nilesh Chhapru. On Tuesday 28 July 2015 12:37 PM, Ewen Cheslack-Postava wrote: Nilesh, It's expected that a lot of memory is used for cache. This makes sense because under the hood, Kafka mostly just reads and writes data to/from files. While Kafka does manage some in-memory data, mostly

Re: best way to call ReassignPartitionsCommand programmatically

2015-08-10 Thread Ewen Cheslack-Postava
It's not public API so it may not be stable between releases, but you could try using the ReassignPartitionsCommand class directly. Or, you can see that the code in that class is a very simple use of ZkUtils, so you could just make the necessary calls to ZkUtils directly. In the future, when

Re: How to read messages from Kafka by specific time?

2015-08-10 Thread Ewen Cheslack-Postava
You can use SimpleConsumer.getOffsetsBefore to get a list of offsets before a Unix timestamp. However, this isn't per-message. The offests returned are for the log segments stored on the broker, so the granularity will depend on your log rolling settings. -Ewen On Wed, Aug 5, 2015 at 2:11 AM,

Re: how to get single record from kafka topic+partition @ specified offset

2015-08-10 Thread Ewen Cheslack-Postava
:52 PM, Joe Lawson jlaw...@opensourceconnections.com wrote: Ewen, Do you have an example or link for the changes/plans that will bring the benefits you describe? Cheers, Joe Lawson On Aug 10, 2015 3:27 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You can do this using

Re: how to get single record from kafka topic+partition @ specified offset

2015-08-10 Thread Ewen Cheslack-Postava
You can do this using the SimpleConsumer. See https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example for details with some code. When the new consumer is released in 0.8.3, this will get a *lot* simpler. -Ewen On Fri, Aug 7, 2015 at 9:26 AM, Padgett, Ben

Re: problem about the offset

2015-08-10 Thread Ewen Cheslack-Postava
Kafka doesn't track per-message timestamps. The request you're using gets a list of offsets for *log segments* with timestamps earlier than the one you specify. If you start consuming from the offset returned, you should find the timestamp you specified in the same log file. -Ewen On Mon, Aug

Re: Kafka java consumer

2015-08-14 Thread Ewen Cheslack-Postava
There's not a precise date for the release, ~1.5 or 2 months from now. On Fri, Aug 14, 2015 at 3:45 PM, Abhijith Prabhakar abhi.preda...@gmail.com wrote: Thanks Ewen. Any idea when we can expect 0.8.3? On Aug 14, 2015, at 5:36 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Hi

Re: Kafka java consumer

2015-08-14 Thread Ewen Cheslack-Postava
Hi Abhijith, You should be using KafkaProducer, but KafkaConsumer is not ready yet. The APIs are included in 0.8.2.1, but the implementation is not ready. Until 0.8.3 is released, you cannot rely only on kafka-clients if you want to write a consumer. You'll need to depend on the main kafka jar

Re: Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Ewen Cheslack-Postava
Hi Prabhjot, Confluent has a REST proxy with docs that may give some guidance: http://docs.confluent.io/1.0/kafka-rest/docs/intro.html The new producer that it uses is very efficient, so you should be able to get pretty good throughput. You take a bit of a hit due to the overhead of sending data

Re: Cache Memory Kafka Process

2015-07-27 Thread Ewen Cheslack-Postava
Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and

Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
log.segment.bytes? Thanks. On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava e...@confluent.io wrote: I think log.cleanup.interval.mins was removed in the first 0.8 release. It sounds like you're looking at outdated docs. Search for log.retention.check.interval.ms here: http

Re: deleting data automatically

2015-07-27 Thread Ewen Cheslack-Postava
log.retention.check.interval.ms, but there is log.cleanup.interval.mins, is that what you mean? If I set log.roll.ms or log.cleanup.interval.mins too small, will it hurt the throughput? Thanks. On Fri, Jul 24, 2015 at 11:03 PM, Ewen Cheslack-Postava e...@confluent.io wrote: You'll want to set the log retention

Re: wow--kafka--why? unresolved dependency: com.eed3si9n#sbt-assembly;0.8.8: not found

2015-07-23 Thread Ewen Cheslack-Postava
,daemonRegistryDir=/root/.gradle/daemon,pid=26478,idleTimeout=12,daemonOpts=-XX:MaxPermSize=512m,-Xmx1024m,-Dfile.encoding=UTF-8,-Duser.country=PH,-Duser.language=en,-Duser.variant]}. 04:32:54.865 [DEBUG] [org.gradle.launch On Fri, Jul 24, 2015 at 3:34 AM, Ewen Cheslack-Postava e...@confluent.io

Re: wow--kafka--why? unresolved dependency: com.eed3si9n#sbt-assembly;0.8.8: not found

2015-07-23 Thread Ewen Cheslack-Postava
Also, the branch you're checking out is very old. If you want the most recent release, that's tagged as 0.8.2.1. Otherwise, you'll want to use the trunk branch. -Ewen On Thu, Jul 23, 2015 at 11:45 AM, Gwen Shapira gshap...@cloudera.com wrote: Sorry, we don't actually do SBT builds anymore.

Re: Choosing brokers when creating topics

2015-07-27 Thread Ewen Cheslack-Postava
Try the --replica-assignment option for kafka-topics.sh. It allows you to specify which brokers to assign as replicas instead of relying on the assignments being made automatically. -Ewen On Mon, Jul 27, 2015 at 12:25 AM, Jilin Xie jilinxie1...@gmail.com wrote: Hi Is it possible to

Re: New producer hangs inifitely when it looses connection to Kafka cluster

2015-07-21 Thread Ewen Cheslack-Postava
This is a known issue. There are a few relevant JIRAs and a KIP: https://issues.apache.org/jira/browse/KAFKA-1788 https://issues.apache.org/jira/browse/KAFKA-2120 https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient -Ewen On Tue, Jul 21, 2015 at 7:05

Re: New consumer - poll/seek javadoc confusing, need clarification

2015-07-21 Thread Ewen Cheslack-Postava
On Tue, Jul 21, 2015 at 2:38 AM, Stevo Slavić ssla...@gmail.com wrote: Hello Apache Kafka community, I find new consumer poll/seek javadoc a bit confusing. Just by reading docs I'm not sure what the outcome will be, what is expected in following scenario: - kafkaConsumer is instantiated

Re: Retrieving lost messages produced while the consumer was down.

2015-07-21 Thread Ewen Cheslack-Postava
Since you mentioned consumer groups, I'm assuming you're using the high level consumer? Do you have auto.commit.enable set to true? It sounds like when you start up you are always getting the auto.offset.reset behavior, which indicates you don't have any offsets committed. By default, that

Re: deleting data automatically

2015-07-24 Thread Ewen Cheslack-Postava
You'll want to set the log retention policy via log.retention.{ms,minutes,hours} or log.retention.bytes. If you want really aggressive collection (e.g., on the order of seconds, as you specified), you might also need to adjust log.segment.bytes/log.roll.{ms,hours} and

Re: Data Structure abstractions over kafka

2015-07-13 Thread Ewen Cheslack-Postava
Tim, Kafka can be used as a key-value store if you turn on log compaction: http://kafka.apache.org/documentation.html#compaction You need to be careful with that since it's purely last-writer-wins and doesn't have anything like CAS that might help you manage concurrent writers, but the basic

Re: kafka benchmark tests

2015-07-13 Thread Ewen Cheslack-Postava
I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move into Kafka -- see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70 In particular, that test is implemented in benchmark_test.py:

Re: resources for simple consumer?

2015-07-15 Thread Ewen Cheslack-Postava
Hi Jeff, The simple consumer hasn't really changed, the info you found should still be relevant. The wiki page on it might be the most useful reference for getting started: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example And if you want a version all setup to

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Ewen Cheslack-Postava
Also worth mentioning is that the new producer doesn't have this behavior -- it will round robin over available partitions for records without keys. Available means it currently has a leader -- under normal cases this means it distributes evenly across all partitions, but if a partition is down

Re: latency performance test

2015-07-15 Thread Ewen Cheslack-Postava
The tests are meant to evaluate different things and the way they send messages is the source of the difference. EndToEndLatency works with a single message at a time. It produces the message then waits for the consumer to receive it. This approach guarantees there is no delay due to queuing. The

Re: kafka benchmark tests

2015-07-14 Thread Ewen Cheslack-Postava
at? Thanks. Yuheng On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava e...@confluent.io wrote: I implemented (nearly) the same basic set of tests in the system test framework we started at Confluent and that is going to move

Re: Kafka brokers on HTTP port

2015-07-16 Thread Ewen Cheslack-Postava
, Chandrash3khar Kotekar Mobile - +91 8600011455 On Fri, Jul 17, 2015 at 4:57 AM, Ewen Cheslack-Postava e...@confluent.io wrote: Chandrashekhar, If the firewall rules allow any TCP connection on those ports, you can just use Kafka directly and change the default port. If they actually verify

Re: Kafka brokers on HTTP port

2015-07-16 Thread Ewen Cheslack-Postava
Chandrashekhar, If the firewall rules allow any TCP connection on those ports, you can just use Kafka directly and change the default port. If they actually verify that its HTTP traffic then you'd have to the REST Proxy Edward mentioned or another HTTP-based proxy. -Ewen On Thu, Jul 16, 2015 at

Re: latency performance test

2015-07-16 Thread Ewen Cheslack-Postava
, Thank you for your patient explaining. It is very helpful. Can we assume that the long latency of ProducerPerformance comes from queuing delay in the buffer and it is related to buffer size? Thank you! best, Yuheng On Thu, Jul 16, 2015 at 12:21 AM, Ewen Cheslack-Postava e...@confluent.io

Re: [VOTE] 0.9.0.0 Candiate 1

2015-11-10 Thread Ewen Cheslack-Postava
Jun, not sure if this is just because of the RC vs being published on the site, but the links in the release notes aren't pointing to issues.apache.org. They're relative URLs instead of absolute. -Ewen On Tue, Nov 10, 2015 at 3:38 AM, Flavio Junqueira wrote: > -1 (non-binding)

Re: Ephemeral ports for Kafka broker

2015-11-11 Thread Ewen Cheslack-Postava
Passing 0 as the port should let you do this. This is how we get the tests to work without assuming a specific port is available. The KafkaServer.boundPort(SecurityProtocol) method can be used to get the port that was bound. -Ewen On Tue, Nov 10, 2015 at 11:23 PM, Hemanth Yamijala

Re: kafka connect(copycat) question

2015-11-10 Thread Ewen Cheslack-Postava
Hi Venkatesh, If you're using the default settings included in the sample configs, it'll expect JSON data in a special format to support passing schemas along with the data. This is turned on by default because it makes it possible to work with a *lot* more connectors and data storage systems

Re: kafka connect(copycat) question

2015-11-12 Thread Ewen Cheslack-Postava
he sink as part of a confluent > project ( > > https://github.com/confluentinc/copycat-hdfs/blob/master/src/main/java/io/confluent/copycat/hdfs/HdfsSinkConnector.java > ). > Does it mean that I build this project and add the jar to kafka libs ? > > > > > On Tue, Nov

Re: kafka connect(copycat) question

2015-11-16 Thread Ewen Cheslack-Postava
jar:0.9.0.0, > io.confluent:kafka-connect-avro-converter:jar:2.0.0-SNAPSHOT, > io.confluent:common-config:jar:2.0.0-SNAPSHOT: Could not find artifact > org.apache.kafka:connect-api:jar:0.9.0.0 in confluent > > On Thu, Nov 12, 2015 at 2:59 PM, Ewen Cheslack-Postava <e...@confluent.io&g

Re: where do I get the Kafka classes

2015-11-16 Thread Ewen Cheslack-Postava
Hi Adaryl, First, it looks like you might be trying to use the old producer interface. That interface is going to be deprecated in favor of the new producer (under org.apache.kafka.clients.producer). I'd highly recommend using the new producer interface instead. Second, perhaps this repository

Re: kafka connect(copycat) question

2015-11-10 Thread Ewen Cheslack-Postava
in > double-quotes. > 2) Once I hit the above JsonParser error on the SinkTask, the connector is > hung, doesn't take any more messages even proper ones. > > > On Tue, Nov 10, 2015 at 1:59 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > Hi Venka

Re: Kafka 090 maven coordinate

2015-11-03 Thread Ewen Cheslack-Postava
0.9.0.0 is not released yet, but the last blockers are being addressed and release candidates should follow soon. The docs there are just staged as we prepare for the release (note, e.g., that the latest release on the downloads page http://kafka.apache.org/downloads.html is still 0.8.2.2). -Ewen

Re: Difference between storing offset on Kafka and on Zookeeper server?

2015-10-15 Thread Ewen Cheslack-Postava
There are a couple of advantages. First, scalability. Writes to Kafka are cheaper than writes to ZK. Kafka-based offset storage is going to be able to handle significantly more consumers (and scale out as needed since writes are spread across all partitions in the offsets topic). Second, once you

Re: Partition ownership with high-level consumer

2015-10-07 Thread Ewen Cheslack-Postava
And you can get the current assignment in the new consumer after the rebalance completes too: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L593 On Tue, Oct 6, 2015 at 5:27 PM, Gwen Shapira wrote: > Ah,

Re: Kafka topic message consumer fetch response time checking

2015-10-20 Thread Ewen Cheslack-Postava
If you want the full round-trip latency, you need to measure that at the client. The performance measurement tools should do this pretty accurately. For example, if you just want to know how long it takes to produce a message to Kafka and get an ack back, you can use the latency numbers reported

Re: kafka consumer shell scripts (and kafkacat) with respect to JSON pretty printing

2015-10-20 Thread Ewen Cheslack-Postava
You can accomplish this with the console consumer -- it has a formatter flag that lets you plug in custom logic for formatting messages. The default does not do any formatting, but if you write your own implementation, you just need to set the flag to plug it in. You can see an example of this in

Re: Documentation for 0.8.2 kafka-client api?

2015-10-08 Thread Ewen Cheslack-Postava
ConsumerConnector is part of the old consumer API (which is what is currently released; new consumer is coming in 0.9.0). That class is not in kafka-clients, it is in the core Kafka jar, which is named with the Scala version you want to use, e.g. kafka_2.10. -Ewen On Thu, Oct 8, 2015 at 1:24 PM,

Re: Better API/Javadocs?

2015-10-11 Thread Ewen Cheslack-Postava
On Javadocs, both new clients (producer and consumer) have very thorough documentation in the javadocs. 0.9.0.0 will be the first release with the new consumer. On deserialization, the new consumer lets you specify deserializers just like you do for the new producer. But the old consumer supports

Re: Kafka 0.9.0 release branch

2015-10-13 Thread Ewen Cheslack-Postava
Yes, 0.9 will include the new consumer. On Tue, Oct 13, 2015 at 12:50 PM, Rajiv Kurian wrote: > A bit off topic but does this release contain the new single threaded > consumer that supports the poll interface? > > Thanks! > > On Mon, Oct 12, 2015 at 1:31 PM, Jun Rao

Re: [kafka-clients] Kafka 0.9.0 release branch

2015-10-13 Thread Ewen Cheslack-Postava
Not sure if I'd call it a blocker, but if we can get it in I would *really* like to see some solution to https://issues.apache.org/jira/browse/KAFKA-2397 committed. Without an explicit leave group, even normal operation of groups can leave some partitions unprocessed for 30-60s at a time under

Re: Http Kafka producer

2015-08-27 Thread Ewen Cheslack-Postava
, or are batches just held in memory before being sent to Kafka (or some other option)? Thanks! On Aug 26, 2015, at 9:50 PM, Ewen Cheslack-Postava e...@confluent.io wrote: Hemanth, The Confluent Platform 1.0 version of have JSON embedded format support (i.e. direct embedding of JSON messages

Re: Http Kafka producer

2015-08-26 Thread Ewen Cheslack-Postava
Hemanth, Can you be a bit more specific about your setup? Do you have control over the format of the request bodies that reach HAProxy or not? If you do, Confluent's REST proxy should work fine and does not require the Schema Registry. It supports both binary (encoded as base64 so it can be

  1   2   3   4   >