Re: Match Producer and RecordMetadata with Consumer and ConsumerRecord

2015-12-09 Thread Gwen Shapira
Correlation ID is for a request (i.e. separate ID for produce request and a fetch request), not a record. So it can't be used in the way you are trying to. On Wed, Dec 9, 2015 at 9:30 AM, John Menke wrote: > Can a correlationID be created from a ConsumerRecord that will allow for > identificatio

Re: Unable to connect to AWS Kafka cluster remotely

2015-12-08 Thread Gwen Shapira
Sounds like you need to use advertised.host configuration with the external name / ip. This means that the broker will send producers / consumers / zookeeper their external address and they will be able to connect. Gwen On Tue, Dec 8, 2015 at 11:17 AM, Henrik Martin wrote: > Greetings. Apologi

Re: Kafka integration with Oracle QA

2015-12-08 Thread Gwen Shapira
Hi, Can you explain a bit more what you'd expect this integration to do? Kafka is a queue, just like Oracle AQ is - so I can see how you may replace Oracle AQ with Kafka, but I'm not sure what you are trying to achieve by integrating them. Gwen On Mon, Dec 7, 2015 at 7:52 PM, CY Kuek wrote: >

Re: New Consumer API + Reactive Kafka

2015-12-02 Thread Gwen Shapira
On Wed, Dec 2, 2015 at 10:44 PM, Krzysztof Ciesielski < krzysztof.ciesiel...@softwaremill.pl> wrote: > Hello, > > I’m the main maintainer of Reactive Kafka - a wrapper library that > provides Kafka API as Reactive Streams ( > https://github.com/softwaremill/reactive-kafka). > I’m a bit concerned a

Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-27 Thread Gwen Shapira
; Thanks, > Prabhjot > On Nov 28, 2015 10:20 AM, "Gwen Shapira" wrote: > > > In your scenario, you are receiving acks from 3 replicas while it is > > possible to have 4 in the ISR. This means that one replica can be up to > > 4000 messages (by default) behin

Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-11-27 Thread Gwen Shapira
In your scenario, you are receiving acks from 3 replicas while it is possible to have 4 in the ISR. This means that one replica can be up to 4000 messages (by default) behind others. If a leader crashes, there is 33% chance this replica will become the new leader, thereby losing up to 4000 messages

Re: Increasing replication factor reliable?

2015-11-25 Thread Gwen Shapira
Yeah - it will increase IO and network utilization by a lot while data is replication, but it should be safe. On Tue, Nov 24, 2015 at 4:56 PM, Dillian Murphey wrote: > Is it safe to run this on an active production topic? A topic was created > without a replication factor of 2 and I want to inc

Re: Kafka broker goes down when consumer is stopped.

2015-11-25 Thread Gwen Shapira
It looks like you have a single broker (with id = 0 ) and that topic1 has a single replica and the broker is alive and well. The socket error is our bug (shouldn't be an error) and doesn't indicate that the broker is down. On Wed, Nov 25, 2015 at 3:26 AM, Shaikh, Mazhar A (Mazhar) < mazhar.sha...

Re: Flush Messages in KafkaProducer Buffer

2015-11-25 Thread Gwen Shapira
In 0.9.0, close() has a timeout parameter that allows specifying how long to wait for the in-flight messages to complete (definition of complete depends on value of "acks" parameter). On Wed, Nov 25, 2015 at 3:58 AM, Muqtafi Akhmad wrote: > Hello guys, > > I am using KafkaProducer (org.apache.ka

Re: All brokers are running but some partitions' leader is -1

2015-11-25 Thread Gwen Shapira
together in the same Kafka cluster? > Also we currently run spark streaming job (with scala 2.10) against the > cluster. Any known issues of 0.9.0 are you aware of under this scenario? > > Thanks, > Tony > > > On Mon, Nov 23, 2015 at 5:41 PM, Gwen Shapira wrote: > > >

Re: Is re-partition hitless process?

2015-11-24 Thread Gwen Shapira
this to do very few (maybe one) partition at a time. On Tue, Nov 24, 2015 at 4:42 PM, Dillian Murphey wrote: > Not adding. Taking some of the partitions from one kafka server and > spreading them to another. > > On Mon, Nov 23, 2015 at 5:40 PM, Gwen Shapira wrote: > > >

Re: Producer property to set to enable async data transfer in kafka 8.2.2

2015-11-24 Thread Gwen Shapira
The new producer is async by default. You can see few examples of how to use it here: https://github.com/gwenshap/kafka-examples/tree/master/SimpleCounter/src/main/java/com/shapira/examples/producer/simplecounter On Tue, Nov 24, 2015 at 10:40 AM, Amit Karyekar wrote: > Hi folks, > > We are work

Re: Change kafka broker ids dynamically

2015-11-24 Thread Gwen Shapira
You should definitely use the same id if you still have the data - it makes life so much better. There are 3 common ways to do it: 1. Use the last 3 digits of the IP as the broker ID (assuming Docker gives you the same IP when the container relaunches) 2. Use a deployment manager that can register

Re: Fetching Offsets Stored in Kafka in 0.9.0

2015-11-24 Thread Gwen Shapira
Are you using the new consumer API (KafkaConsumer) or the older ZookeeperConnector? KafkaConsumer has seek() API allowing you to replay events from any point. You can also manually commit a specific offset. Gwen On Tue, Nov 24, 2015 at 2:11 PM, Jack Lund wrote: > We’re running Kafka 0.9.0, and

Re: All brokers are running but some partitions' leader is -1

2015-11-23 Thread Gwen Shapira
We fixed many many bugs since August. Since we are about to release 0.9.0 (with SSL!), maybe wait a day and go with a released and tested version. On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu wrote: > Forgot to mention is that the Kafka version we're using is from Aug's > Trunk branch---which has the

Re: Is re-partition hitless process?

2015-11-23 Thread Gwen Shapira
By re-partition you mean adding partitions to an existing topics? There are two things to note in that case: 1. It is "hitless" because all it does is create new partitions where future records can go, it does not actually move data around. 2. You could be "hit" if your consumer code assumes that

Re: Re: Re: Re: Re: Kafka lost data issue

2015-11-16 Thread Gwen Shapira
r your excellent slides >> >> I will test it again based on your suggestions. >> >> >> >> >> Best regards >> Hawin >> >> On Thu, Nov 12, 2015 at 6:35 PM, Gwen Shapira wrote: >> >> > Hi, >> > >> > First, h

Re: Kafka log retention questions

2015-11-13 Thread Gwen Shapira
e again. > > Your help is much appreciated. > Best regards, > Dilpreet > > > > On 11/13/15, 1:24 AM, "Raju Bairishetti" wrote: > > >Adding some more info inline. > > > >On Fri, Nov 13, 2015 at 10:43 AM, Gwen Shapira wrote: > > > >>

Re: Kafka log retention questions

2015-11-12 Thread Gwen Shapira
See answers inline On Thu, Nov 12, 2015 at 2:53 PM, Sandhu, Dilpreet wrote: > Hi all, >I am new to Kafka usage. Here are some questions that I have in > mind. Kindly help me understand it better. If some questions make no sense > feel free to call it out. > 1. Is it possible to prune lo

Re: Re: Re: Re: Re: Kafka lost data issue

2015-11-12 Thread Gwen Shapira
Hi, First, here's a handy slide-deck on avoiding data loss in Kafka: http://www.slideshare.net/gwenshap/kafka-reliability-when-it-absolutely-positively-has-to-be-there Note configuration parameters like the number of retries. Also, it looks like you are sending data to Kafka asynchronously, but

Re: [kafka-clients] Re: [VOTE] 0.9.0.0 Candiate 1

2015-11-10 Thread Gwen Shapira
BTW. I created a Jenkins job for the 0.9 branch: https://builds.apache.org/job/kafka_0.9.0_jdk7/ Right now its pretty much identical to trunk, but since they may diverge, I figured we want to keep an eye on the branch separately. Gwen On Tue, Nov 10, 2015 at 11:39 AM, Jun Rao wrote: > Ewen, >

Re: Debug kafka code in intellij

2015-11-05 Thread Gwen Shapira
7;Kafka-0.8.2.1' tests > Please configure separate output paths to proceed with the compilation. > TIP: you can use Project Artifacts to combine compiled classes if needed. > > Regards, > Prabhjot > > On Fri, Nov 6, 2015 at 10:00 AM, Gwen Shapira wrote: > > > I also

Re: Debug kafka code in intellij

2015-11-05 Thread Gwen Shapira
> de.linkedin.com/in/radgruchalski/ ( > > > http://de.linkedin.com/in/radgruchalski/) > > > > > > Confidentiality: > > > This communication is intended for the above-named person and may be > > > confidential and/or legally privileged. > > > I

Re: Debug kafka code in intellij

2015-11-05 Thread Gwen Shapira
Running tests from intellij is fairly easy - you click on the test name and select "run" or "debug", if you select "debug" it honors breakpoints. Rad, what happens when you try to run a test within Intellij? On Thu, Nov 5, 2015 at 2:55 PM, Dong Lin wrote: > Hi Rad, > > I never use intellij to r

Re: Quick kafka-reassign-partitions.sh script question

2015-11-02 Thread Gwen Shapira
Actually, no. You can move partitions online. The way it works is that: 1. A new replica is created for the partition in the new broker 2. It starts replicating from the leader until it catches up - if you continue producing at this time, it will take longer to catch up. 3. Once the new replica ca

Re: Kafka and Spark Issue

2015-11-02 Thread Gwen Shapira
Since the error is from the HBase client and completely unrelated to Kafka, you will have better luck in the HBase mailing list. On Mon, Nov 2, 2015 at 9:16 AM, Nikhil Gs wrote: > Hello Team, > > My scenario is to load the data from producer topic to Hbase by using Spark > API. Our cluster is Ke

Re: Questions about .9 consumer API

2015-10-23 Thread Gwen Shapira
There are some examples that include error handling. These are to demonstrate the new and awesome seek() method. You don't have to handle errors that way, we are just showing that you can. On Thu, Oct 22, 2015 at 8:34 PM, Mohit Anchlia wrote: > It's in this link. Most of the examples have some ki

Re: Kafka 8.2.2 doesn't want to compress in snappy

2015-10-20 Thread Gwen Shapira
We compress a batch of messages together, but we need to give each message its own offset (and know its key if we want to use topic compaction), so messages are un-compressed and re-compressed. We are working on an improvement to add relative offsets which will allow the broker to skip this re-com

Re: Where is replication factor stored?

2015-10-16 Thread Gwen Shapira
We don't store the replication factor per-se. When the topic is created, we use the replication factor to generate replica-assignment, and the replica assignment gets stored in ZK under: /brokers/topics//partitions/... This is what gets modified when we re-assign replicas. Hope this helps. Gwen

Re: Strange ZK Error precedes frequent rebalances

2015-10-14 Thread Gwen Shapira
e subscribed to? > > On Wed, Oct 14, 2015 at 3:52 PM Gwen Shapira wrote: > > > It is not strange, it means that one of the consumers lost connectivity > to > > Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like > > /consumers/real-time-updates/i

Re: Strange ZK Error precedes frequent rebalances

2015-10-14 Thread Gwen Shapira
It is not strange, it means that one of the consumers lost connectivity to Zookeeper, its session timed-out and this caused ephemeral ZK nodes (like /consumers/real-time-updates/ids/real-time-updates_infra- buildagent-06-1444854764478-4dd4d6af) to be removed and ultimately cause the rebalance. Wha

Re: r/apachekafka subreddit

2015-10-13 Thread Gwen Shapira
Subscribed :) Since the mailing list is rather active, I'm not sure there is significant benefit in a reddit community - but I'll be around to join discussions and see how it turns out. On Tue, Oct 13, 2015 at 1:58 PM, Andrew Pennebaker < andrew.penneba...@gmail.com> wrote: > Any Redditors in th

Re: Partitions - Brokers - Servers

2015-10-13 Thread Gwen Shapira
Hi, We normally run 1 broker per 1 physical server, and up to around 1000 partitions per broker (although that depends on the specific machine the broker is on and specific configuration). In order to enjoy replication, we recommend a minimum of 3 brokers in the cluster, to support 3 replicas per

Re: 0.9.0 Beta Release

2015-10-08 Thread Gwen Shapira
Hi Jarred, At the moment, we still believe that we are planning a release (without a beta, I think) in mid-Nov. On the other hand, we are all engineers, we tend to be optimistic about those things :) Gwen On Thu, Oct 8, 2015 at 10:31 AM, Jarred Ward wrote: > Greetings, > > We were really looki

Re: Datacenter to datacenter over the open internet

2015-10-06 Thread Gwen Shapira
You can configure "advertised.host.name" for each broker, which is the name external consumers and producers will use to refer to the brokers. On Tue, Oct 6, 2015 at 3:31 PM, Tom Brown wrote: > Hello, > > How do you consume a kafka topic from a remote location without a dedicated > connection? H

Re: mapping events to topics

2015-10-06 Thread Gwen Shapira
I usually approach this questions by looking at possible consumers. You usually want each consumer to read from relatively few topics, use most of the messages it receives and have fairly cohesive logic for using these messages. Signs that things went wrong with too few topics: * Consumers that t

Re: Partition ownership with high-level consumer

2015-10-06 Thread Gwen Shapira
ent should know > that in order to acquire the zookeeper locks and could potentially execute > a callback to tell me the partitions I own after a rebalance. > > -Joey > > On Tue, Oct 6, 2015 at 4:08 PM, Gwen Shapira wrote: > > > I don't think so. AFAIK, even the new

Re: Partition ownership with high-level consumer

2015-10-06 Thread Gwen Shapira
I don't think so. AFAIK, even the new API won't send this information to every consumer, because in some cases it can be huge. On Tue, Oct 6, 2015 at 1:44 PM, Joey Echeverria wrote: > But nothing in the API? > > -Joey > > On Tue, Oct 6, 2015 at 3:43 PM, Gwen Shapir

Re: Partition ownership with high-level consumer

2015-10-06 Thread Gwen Shapira
Zookeeper will have this information under /consumers//owners On Tue, Oct 6, 2015 at 12:22 PM, Joey Echeverria wrote: > Hi! > > Is there a way to track current partition ownership when using the > high-level consumer? It looks like the rebalance callback only tells me the > partitions I'm (pot

Re: Dealing with large messages

2015-10-06 Thread Gwen Shapira
Storing large blobs in S3 or HDFS and placing URIs in Kafka is the most common solution I've seen in use. On Tue, Oct 6, 2015 at 8:32 AM, Joel Koshy wrote: > The best practice I think is to just put large objects in a blob store > and have messages embed references to those blobs. Interestingly

Re: which producer should be used

2015-09-28 Thread Gwen Shapira
> http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html > > Is there an equivalent Java API for 0.8.2 yet or is the older one the most > current? > > -- > Sharninder > > > On Mon, Sep 28, 2015 at 9:15 AM, Gwen Shapira wrote: > >

Re: which producer should be used

2015-09-27 Thread Gwen Shapira
KafkaProducer is the most current and full-featured one, and it should be used. The other producers will be deprecated in a release or two, so I recommend not to use them. On Sun, Sep 27, 2015 at 8:40 PM, Li Tao wrote: > Hi there, > I noticed that there are several producers our there: > > **

Re: Frequent Consumer and Producer Disconnects

2015-09-25 Thread Gwen Shapira
How busy are the clients? The brokers occasionally close idle connections, this is normal and typically not something to worry about. However, this shouldn't happen to consumers that are actively reading data. I'm wondering if the "consumers not making any progress" could be due to a different is

Re: log clean up

2015-09-25 Thread Gwen Shapira
Absolutely. You can go into config/log4j.properties and configure the appenders to roll the logs. For example: log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender log4j.appender.stateChangeAppender.DatePattern='.'-MM-dd-HH log4j.appender.stateChangeAppender.File=${ka

Re: Mapping a consumer in a consumer group to a partition in a topic

2015-09-22 Thread Gwen Shapira
Unfortunately, in order to get a specific partition, you will need to use the simple consumer API, which does not have consumer groups. see here for details: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example On Tue, Sep 22, 2015 at 6:08 PM, Spandan Harithas Karamchedu

Re: Tools/recommendations to debug performance issues?

2015-09-14 Thread Gwen Shapira
Kafka also collects very useful metrics on request times and their breakdown. They are under kafka.network. On Mon, Sep 14, 2015 at 6:59 AM, Rahul Jain wrote: > Have you checked the consumer lag? You can use the offset checker tool to > see if there is a lag. > On 14 Sep 2015 18:36, "noah" wr

Re: Auto preferred leader elections cause data loss?

2015-09-14 Thread Gwen Shapira
acks = all should prevent this scenario: If broker 0 is still in ISR, the produce request for 101 will not be "acked" (because 0 is in ISR and not available for acking), and the producer will retry it until all ISR acks. If broker 0 dropped off ISR, it will not be able to rejoin until it has all

Re: 0.9.0.0 remaining jiras

2015-09-14 Thread Gwen Shapira
Agree that these are very nice to have. We've seen many deployments that need to manage these on their own. However, if this is not ready before we are done adding security and the new consumer, it will make sense to still release 0.9.0 and add the broker management improvements in 0.9.1. I'm tryi

Re: 0.9.0.0 remaining jiras

2015-09-14 Thread Gwen Shapira
We decided to rename 0.8.3 to 0.9.0 since it contains few large changes (Security, new consumer, quotas). On Sun, Sep 13, 2015 at 11:56 PM, Jason Rosenberg wrote: > Hi Jun, > > Can you clarify, will there not be a 0.8.3.0 (and instead we move straight > to 0.9.0.0)? > > Also, can you outline t

Re: [VOTE] 0.8.2.2 Candidate 1

2015-09-09 Thread Gwen Shapira
+1 non-binding - verified signatures and build. On Wed, Sep 9, 2015 at 10:28 AM, Ewen Cheslack-Postava wrote: > +1 non-binding. Verified artifacts, unit tests, quick start. > > On Wed, Sep 9, 2015 at 10:09 AM, Guozhang Wang wrote: > > > +1 binding, verified unit tests and quick start. > > > > O

Re: Slow ISR catch-up

2015-09-03 Thread Gwen Shapira
Yes, this should work. Expect lower throughput though. On Thu, Sep 3, 2015 at 12:52 PM, Prabhjot Bharaj wrote: > Hi, > > Can I use sync for acks = -1? > > Regards, > Prabhjot > On Sep 3, 2015 11:49 PM, "Gwen Shapira" wrote: > > > The test uses the old

Re: Slow ISR catch-up

2015-09-03 Thread Gwen Shapira
The test uses the old producer (we should fix that), and since you don't specify --sync, it runs async. The old async producer simply sends data and doesn't wait for acks, so it is possible that the messages were never acked... On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj wrote: > Hi Folks, >

Re: API to query cluster metadata on-demand

2015-09-03 Thread Gwen Shapira
Ah, I wish. We are working on it :) On Thu, Sep 3, 2015 at 9:10 AM, Simon Cooper < simon.coo...@featurespace.co.uk> wrote: > Is there a basic interface in the new client APIs to get the list of > topics on a cluster, and get information on the topics (offsets, sizes, > etc), without having to de

Re: Competing customers

2015-09-03 Thread Gwen Shapira
Yeah, scaling through adding partitions ("sharding") is a basic feature of Kafka. We expect topics to have many partitions (at least as many as number of consumers), and each consumer to get a subset of the messages by getting a subset of partitions. This design gives Kafka its two biggest advanta

Re: Kafka loses data after one broker reboot

2015-09-01 Thread Gwen Shapira
There is another edge-case that can lead to this scenario. It is described in detail in KAFKA-2134, and I'll copy Becket's excellent summary here for reference: 1) Broker A (leader) has committed offset up-to 5000 2) Broker B (follower) has committed offset up to 3000 (he is still in ISR because o

Re: 0.8.2 producer and single message requests

2015-09-01 Thread Gwen Shapira
ntially get async model) for every send() and based on that it > > should > > > > respond to its clients whether the call is successful or not. The > > clients > > > > of your webservice should have fault tolerance built on top of your > > > > response co

Re: Get kafka connection status

2015-08-30 Thread Gwen Shapira
Two suggestions: 1. While the consumer is connected, it has one or more threads called "ConsumerFetcherThread---" If you can look at which threads are currently running and check if any called ConsumerFetcherThread-* exist, this is a good indication. The threads are closed when shutdown() is calle

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Gwen Shapira
; > > > We are fine with non SSL consumer as our kafka cluster and spark cluster > > are in the same network > > > > > > Thanks, > > Sourabh > > > > On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira > wrote: > > I can't speak for the Spark Co

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Gwen Shapira
I can't speak for the Spark Community, but checking their code, DirectKafkaStream and KafkaRDD use the SimpleConsumer API: https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala https://github.com/apache/spark/blob/m

Re: Painfully slow kafka recovery

2015-08-21 Thread Gwen Shapira
e been minimal? > > Thanks, > Raja. > > > > On Fri, Aug 21, 2015 at 12:31 PM, Gwen Shapira wrote: > > > By default, num.replica.fetchers = 1. This means only one thread per > broker > > is fetching data from leaders. This means it make take a while for the >

Re: Painfully slow kafka recovery

2015-08-21 Thread Gwen Shapira
By default, num.replica.fetchers = 1. This means only one thread per broker is fetching data from leaders. This means it make take a while for the recovering machine to catch up and rejoin the ISR. If you have bandwidth to spare, try increasing this value. Regarding "no data flowing into kafka" -

Re: Possible Memory Leak in Record Accumulator

2015-08-20 Thread Gwen Shapira
at 5:35 PM, Gwen Shapira wrote: > Hi, > > I didn't see this issue during our network hiccups. You wrote you saw: > > Got error produce response with correlation id 17717 on topic-partition > event.beacon-38, retrying (8 attempts left). Error: NETWORK_EXCEPTION > > What

Re: Possible Memory Leak in Record Accumulator

2015-08-20 Thread Gwen Shapira
Hi, I didn't see this issue during our network hiccups. You wrote you saw: Got error produce response with correlation id 17717 on topic-partition event.beacon-38, retrying (8 attempts left). Error: NETWORK_EXCEPTION What did you see after? Especially once the network issue was resolved? more re

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-18 Thread Gwen Shapira
Any objections if I leave KAFKA-2114 (setting min.insync.replicas default) out? The test code is using changes that were done after 0.8.2.x cut-off, which makes it difficult to cherry-pick. Gwen On Tue, Aug 18, 2015 at 12:16 PM, Gwen Shapira wrote: > Jun, > > KAFKA-2147 doesn'

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-18 Thread Gwen Shapira
xed in trunk and we weren't planning for an 0.8.2.2 release then. > > Thanks, > > Jun > > On Mon, Aug 17, 2015 at 2:56 PM, Gwen Shapira wrote: > > > Thanks for creating a list, Grant! > > > > I placed it on the wiki with a quick evaluation of the con

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-18 Thread Gwen Shapira
corporating that feedback and iterate on it. > > We could absolutely do both 0.8.2.2 and 0.8.3. What I'd ask for is for us > to look at the 0.8.3 timeline too and make a call whether 0.8.2.2 still > makes sense. > > Thanks, > Neha > > On Tue, Aug 18, 2015 at 10:24 AM

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-18 Thread Gwen Shapira
A-2337 & KAFKA-2393 > > KAFKA-1867 > > KAFKA-2407 > > KAFKA-2234 > > KAFKA-1866 > > KAFKA-2345 & KAFKA-2355 > > > > thoughts? > > > > Thank you, > > Grant > > > > On Mon, Aug 17, 2015 at 4:56 PM, Gwen Shapira wrote: >

Re: KafkaConsumer from 0.8.3 trunk hangs indefinitely on poll

2015-08-18 Thread Gwen Shapira
As you can see in the javadoc for KafkaConsumer, you need to call poll() in a loop. Something like: while (true) { * ConsumerRecords records = consumer.poll(100); * records.forEach(c -> queue.add(c.value())); * * } On Tue, Aug 18, 2015 at 2:46 AM, Krogh-Moe, Espen wrote: >

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-17 Thread Gwen Shapira
ZkNodeExistsException > >- KAFKA-2353 <https://issues.apache.org/jira/browse/KAFKA-2353>: > >SocketServer.Processor should catch exception and close the socket > properly > >in configureNewConnections. > >- KAFKA-1836 <https://issues.apache.org/

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-17 Thread Gwen Shapira
but +1 for 0.8.2 patch that marks the new consumer API as unstable (or unimplemented ;) On Mon, Aug 17, 2015 at 9:12 AM, Gwen Shapira wrote: > The network refactoring portion was not tested well enough yet for me to > feel comfortable pushing it into a bugfix release. The new purgato

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-17 Thread Gwen Shapira
a/browse/KAFKA-2345>: > > > Attempt to delete a topic already marked for deletion throws > > >ZkNodeExistsException > > >- KAFKA-2353 <https://issues.apache.org/jira/browse/KAFKA-2353>: > > >SocketServer.Processor should catch exception

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-16 Thread Gwen Shapira
and > > https://issues.apache.org/jira/browse/KAFKA-2120 > > > > On Fri, Aug 14, 2015 at 4:03 PM, Gwen Shapira wrote: > > > > > Will be nice to include Kafka-2308 and fix two critical snappy issues > in > > > the maintenance release. > > > >

Re: 0.8.2 producer and single message requests

2015-08-14 Thread Gwen Shapira
Hi Neelesh :) The new producer has configuration for controlling the batch sizes. By default, it will batch as much as possible without delay (controlled by linger.ms) and without using too much memory (controlled by batch.size). As mentioned in the docs, you can set batch.size to 0 to disable ba

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-14 Thread Gwen Shapira
Will be nice to include Kafka-2308 and fix two critical snappy issues in the maintenance release. Gwen On Aug 14, 2015 6:16 AM, "Grant Henke" wrote: > Just to clarify. Will KAFKA-2189 be the only patch in the release? > > On Fri, Aug 14, 2015 at 7:35 AM, Manikumar Reddy > wrote: > > > +1 for 0

Re: use page cache as much as possiblee

2015-08-13 Thread Gwen Shapira
On Thu, Aug 13, 2015 at 4:10 PM, Kishore Senji wrote: > Consumers can only fetch data up to the committed offset and the reason is > reliability and durability on a broker crash (some consumers might get the > new data and some may not as the data is not yet committed and lost). Data > will be co

Re: Recovering from Kafka NoReplicaOnlineException with one node

2015-08-10 Thread Gwen Shapira
Maybe it is not ZooKeeper itself, but the Broker connection to ZK timed-out and caused the controller to believe that the broker is dead and therefore attempted to elect a new leader (which doesn't exist, since you have just one node). Increasing the zookeeper session timeout value may help. Also,

Re: InvalidMessageException: Message is corrupt, Consumer stuck

2015-08-04 Thread Gwen Shapira
The high level consumer stores its state in ZooKeeper. Theoretically, you should be able to go into ZooKeeper, find the consumer-group, topic and partition, and increment the offset past the "corrupt" point. On Tue, Aug 4, 2015 at 10:23 PM, Henry Cai wrote: > Hi, > > We are using the Kafka high-

Re: message filterin or "selector"

2015-08-04 Thread Gwen Shapira
The way Kafka is currently implemented is that Kafka is not aware of the content of messages, so there is no Selector logic available. The way to go is to implement the Selector in your client - i.e. your consume() loop will get all messages but will throw away those that don't fit your pattern.

Re: How to read in batch using HighLevel Consumer?

2015-08-04 Thread Gwen Shapira
To add some internals, the high level consumer actually does read entire batches from Kafka. It just exposes them to the user in an event loop, because its a very natural API. Users can then batch events the way they prefer. So if you are worried about batches being more efficient than single even

Re: Checkpointing with custom metadata

2015-08-03 Thread Gwen Shapira
how adding metadata to commit message can emulate some light-weight transactions, but I'd be concerned that this capability can get abused... P.S Thanks. I like my new address :) On Mon, Aug 3, 2015 at 6:46 PM, James Cheng wrote: > Nice new email address, Gwen. :) > > On Aug 3, 2015

Re: Checkpointing with custom metadata

2015-08-03 Thread Gwen Shapira
You are correct. You can see that ZookeeperConsumerConnector is hardcoded with null metadata. https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala#L310 More interesting, it looks like the Metadata is not exposed in the new KafkaConsumer eit

Re: 0.9.0 release

2015-08-03 Thread Gwen Shapira
> Shrikant Patel | 817.246.6760 | ext. 4302 Enterprise Architecture > Team PDX-NHIN-Rx.com > > -Original Message- > From: Gwen Shapira [mailto:g...@confluent.io] > Sent: Monday, August 03, 2015 3:23 PM > To: users@kafka.apache.org > Subject: Re: 0.9.0 release

Re: 0.9.0 release

2015-08-03 Thread Gwen Shapira
According to the plan, never :) Is there a specific feature you are looking forward to? I think the most exciting features are planned for 0.8.3 - which is targeted for Oct. On Mon, Aug 3, 2015 at 1:14 PM, Shrikant Patel wrote: > https://cwiki.apache.org/confluence/display/KAFKA/Future+release+

Re: Consumer limit for pub-sub mode

2015-08-03 Thread Gwen Shapira
I don't know a specific limit for number of consumers, perhaps someone will have better idea. Do note that with a single topic and a single partition, you can't really scale by adding more machines - the way Kafka is currently designed, all consumers will read from the one machine that has the lea

Re: The meaning of the term "group"

2015-07-30 Thread Gwen Shapira
Can you point specifically to which offsets() function you are referring to? (i.e object or file name will help) I didn't find a method that takes group as a parameter in the consumer API... On Wed, Jul 29, 2015 at 2:37 PM, Keith Wiley wrote: > ?My understanding is that the group id indicated to

Re: properducertest on multiple nodes

2015-07-24 Thread Gwen Shapira
Does topic "speedx1" exist? On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du wrote: > Hi, > > I am trying to run 20 performance test on 10 nodes using pbsdsh. > > The messages will send to a 6 brokers cluster. It seems to work for a > while. When I delete the test queue and rerun the test, the broker d

Re: wow--kafka--why? unresolved dependency: com.eed3si9n#sbt-assembly;0.8.8: not found

2015-07-23 Thread Gwen Shapira
Sorry, we don't actually do SBT builds anymore. You can build successfully using Gradle: You need to have [gradle](http://www.gradle.org/installation) installed. ### First bootstrap and download the wrapper ### cd kafka_source_dir gradle Now everything else will work ### Building a jar

Re: kafka-topics.sh - Include topic deletion information?

2015-07-22 Thread Gwen Shapira
actually, I believe kafka-topics --list already shows this information. At least, I remember adding this feature... On Wed, Jul 22, 2015 at 3:11 PM, Ashish Singh wrote: > Hey Jaikiran, I think that is a fair ask. However, I am curious in which > scenario would you want to know the topics that sh

Re: Kafka message queue during broker failure

2015-07-22 Thread Gwen Shapira
If you are using the new Kafka Producer (in org.apache.kafka.clients package), you can configure number of retries. The Producer will queue messages and re-attempt to send them as specified. On Wed, Jul 22, 2015 at 11:54 AM, Jeff Gong wrote: > hi all, > > currently working with a team that is int

Re: parametrized number of partitions for topics

2015-07-22 Thread Gwen Shapira
Edenhill's reply actually covers everything: 1) Right now you'll need to use the kafka-topic.sh tool bundled with Kafka. 2) There are future plans to add this capability : https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations On Wed, Ju

Re: Custom Zookeeper install with kafka

2015-07-22 Thread Gwen Shapira
All Cloudera customers use ZK 3.4.5 with no issues. On Wed, Jul 22, 2015 at 11:05 AM, Todd Palino wrote: > Yes, we use ZK 3.4.6 exclusively at LinkedIn and there's no problem. > > -Todd > >> On Jul 22, 2015, at 9:49 AM, Adam Dubiel wrote: >> >> Hi, >> >> I don't think it matters much which versi

Re: ZK chroot path would be automatically created since Kafka 0.8.2.0?

2015-07-22 Thread Gwen Shapira
You are right, this sounds like a doc bug. Do you mind filing a JIRA ticket (http://issues.apache.org/jira/browse/KAFKA) so we can keep track of this issue? On Tue, Jul 21, 2015 at 7:43 PM, yewton wrote: > Hi, > > The document about zookeeper.connect on Broker Configs says that > "Note that you

Re: Delete topic using Admintools is not working

2015-07-16 Thread Gwen Shapira
Looks like you try to delete a topic that is already in the process of getting deleted: NodeExists for /admin/delete_topics/testTopic17 (We can improve the error messages for sure, or maybe even catch the exception and ignore it) Gwen On Thu, Jul 16, 2015 at 3:40 PM, Sivananda Reddy wrote: > Hi

Re: Consumer that consumes only local partition?

2015-07-15 Thread Gwen Shapira
This is not something you can use the consumer API to simply do easily (consumers don't have locality notion). I can imagine using Kafka's low-level API calls to get a list of partitions and the lead replica, figuring out which are local and using those - but that sounds painful. Are you 100% sure

Re: How to run the three producers test

2015-07-14 Thread Gwen Shapira
Are there any errors on the broker logs? On Tue, Jul 14, 2015 at 11:57 AM, Yuheng Du wrote: > Jiefu, > > Thank you. The three producers can run at the same time. I mean should they > be started at exactly the same time? (I have three consoles from each of > the three machines and I just start the

Re: stunning error - Request of length 1550939497 is not valid, it is larger than the maximum size of 104857600 bytes

2015-07-14 Thread Gwen Shapira
I am not familiar with Apache Bench. Can you share more details on what you are doing? On Tue, Jul 14, 2015 at 11:45 AM, JIEFU GONG wrote: > So I'm trying to make a request with a simple ASCII text file, but what's > strange is even if I change files to send or the contents of the file I get > th

Re: How to run the three producers test

2015-07-14 Thread Gwen Shapira
You need to run 3 of those at the same time. We don't expect any errors, but if you run into anything, let us know and we'll try to help. Gwen On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du wrote: > Hi, > > I am running the performance test for kafka. https://gist.github.com/jkreps > /c7ddb4041ef62

Re: Using Kafka as a persistent store

2015-07-13 Thread Gwen Shapira
Hi, 1. What you described sounds like a reasonable architecture, but may I ask why JSON? Avro seems better supported in the ecosystem (Confluent's tools, Hadoop integration, schema evolution, tools, etc). 1.5 If all you do is convert data into JSON, SparkStreaming sounds like a difficult-to-manag

Re: New producer and ordering of Callbacks when sending to multiple partitions

2015-07-13 Thread Gwen Shapira
James, There are separate queues for each partition, so there are no guarantees on the order of the sends (or callbacks) between partitions. (Actually, IIRC, the code intentionally randomizes the partition order a bit, possibly to avoid starvation) Gwen On Mon, Jul 13, 2015 at 5:41 PM, James Che

Re: stunning error - Request of length 1550939497 is not valid, it is larger than the maximum size of 104857600 bytes

2015-07-11 Thread Gwen Shapira
> # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". > # You can also append an optional chroot string to the urls to specify the > # root directory for all kafka znodes. > #zookeeper.connect=localhost:2181 > zookeeper.connect=<%=@zookeeper%> > > > # T

<    1   2   3   4   5   6   >