Re: Kafka with Zookeeper behind AWS ELB

2017-07-20 Thread Pradeep Gollakota
Luigi, I strongly urge you to consider a 5 node ZK deployment. I've always done that in the past for resiliency during maintenance. In a 3 node cluster, you can only tolerate one "failure", so if you bring one node down for maintenance and another node crashes during said maintenance, your ZK

Re: Scaling up kafka consumers

2017-02-24 Thread Pradeep Gollakota
A single partition can be consumed by at most a single consumer. Consumers compete to take ownership of a partition. So, in order to gain parallelism you need to add more partitions. There is a library that allows multiple consumers to consume from a single partition

Re: How does one deploy to consumers without causing re-balancing for real time use case?

2017-02-10 Thread Pradeep Gollakota
ed by the > consumer need to be handled by some other group members." > > So does this mean that the consumer should inform the group ahead of > time before it goes down? Currently, I just shutdown the process. > > > On Fri, Feb 10, 2017 at 8:35 AM, Pradeep Gollakota <pr

Re: How does one deploy to consumers without causing re-balancing for real time use case?

2017-02-10 Thread Pradeep Gollakota
I asked a similar question a while ago. There doesn't appear to be a way to not triggering the rebalance. But I'm not sure why it would be taking > 1hr in your case. For us it was pretty fast. https://www.mail-archive.com/users@kafka.apache.org/msg23925.html On Fri, Feb 10, 2017 at 4:28 AM,

Re: Consumer Rebalancing Question

2017-01-06 Thread Pradeep Gollakota
and reassigns it to another member of > the group. This happens once and then the "issue" is resolved without any > additional interruptions. > > -Ewen > > On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeep...@gmail.com> > wrote: > &

Consumer Rebalancing Question

2017-01-04 Thread Pradeep Gollakota
Hi Kafka folks! When a consumer is closed, it will issue a LeaveGroupRequest. Does anyone know how long the coordinator waits before reassigning the partitions that were assigned to the leaving consumer to a new consumer? I ask because I'm trying to understand the behavior of consumers if you're

Re: kafka + autoscaling groups fuckery

2016-06-28 Thread Pradeep Gollakota
Just out of curiosity, if you guys are in AWS for everything, why not use Kinesis? On Tue, Jun 28, 2016 at 3:49 PM, Charity Majors wrote: > Hi there, > > I just finished implementing kafka + autoscaling groups in a way that made > sense to me. I have a _lot_ of experience

Re: Datacenter to datacenter over the open internet

2015-10-06 Thread Pradeep Gollakota
At Lithium, we have multiple datacenters and we distcp our data across our Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant direct connect from our EU cluster to one of our NA DCs. If and when this fails, we have automatic failover to a VPN that goes over the internet. The

Re: Dealing with large messages

2015-10-06 Thread Pradeep Gollakota
t 2015 02:02, "James Cheng" <jch...@tivo.com> wrote: > > > > > >> Here’s an article that Gwen wrote earlier this year on handling large > > >> messages in Kafka. > > >> > > >> http://ingest.tips/2015/01/21/handling-large-message

Dealing with large messages

2015-10-05 Thread Pradeep Gollakota
Fellow Kafkaers, We have a pretty heavyweight legacy event logging system for batch processing. We're now sending the events into Kafka now for realtime analytics. But we have some pretty large messages (> 40 MB). I'm wondering if any of you have use cases where you have to send large messages

Re: number of topics given many consumers and groups within the data

2015-09-30 Thread Pradeep Gollakota
To add a little more context to Shaun's question, we have around 400 customers. Each customer has a stream of events. Some customers generate a lot of data while others don't. We need to ensure that each customer's data is sorted globally by timestamp. We have two use cases around consumption:

Re: integrate Camus and Hive?

2015-03-09 Thread Pradeep Gollakota
If I understood your question correctly, you want to be able to read the output of Camus in Hive and be able to know partition values. If my understanding is right, you can do so by using the following. Hive provides the ability to provide custom patterns for partitions. You can use this in

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-03 Thread Pradeep Gollakota
Lithium Technologies would love to host you guys for a release party in SF if you guys want. :) On Tue, Feb 3, 2015 at 11:04 AM, Gwen Shapira gshap...@cloudera.com wrote: When's the party? :) On Mon, Feb 2, 2015 at 8:13 PM, Jay Kreps jay.kr...@gmail.com wrote: Yay! -Jay On Mon,

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Pradeep Gollakota
This is a great question Otis. Like Gwen said, you can accomplish Sync mode by setting the batch size to 1. But this does highlight a shortcoming of the new producer API. I really like the design of the new API and it has really great properties and I'm enjoying working with it. However, once API

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Pradeep Gollakota
it to work? Gwen On Mon, Feb 2, 2015 at 1:38 PM, Pradeep Gollakota pradeep...@gmail.com wrote: This is a great question Otis. Like Gwen said, you can accomplish Sync mode by setting the batch size to 1. But this does highlight a shortcoming of the new producer API. I really like the design

Re: Kafka ETL Camus Question

2015-02-02 Thread Pradeep Gollakota
Hi Bhavesh, At Lithium, we don't run Camus in our pipelines yet, though we plan to. But I just wanted to comment regarding speculative execution. We have it disabled at the cluster level and typically don't need it for most of our jobs. Especially with something like Camus, I don't see any need

Re: Max. storage for Kafka and impact

2014-12-19 Thread Pradeep Gollakota
@Joe, Achanta is using Indian English numerals which is why it's a little confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system 1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The rest of the world :P) On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash

Re: [DISCUSS] Kafka Security Specific Features

2014-06-06 Thread Pradeep Gollakota
I'm actually not convinced that encryption needs to be handled server side in Kafka. I think the best solution for encryption is to handle it producer/consumer side just like compression. This will offload key management to the users and we'll still be able to leverage the sendfile optimization

Re: Remote Zookeeper

2014-03-11 Thread Pradeep Gollakota
Is there a firewall thats blocking connections on port 9092? Also, the broker list should be comma separated. On Tue, Mar 11, 2014 at 9:02 AM, A A andthereitg...@hotmail.com wrote: Sorry one of the brokers for was down. Brought it back up. Tried the following

Re: New Consumer API discussion

2014-02-13 Thread Pradeep Gollakota
Hi Neha, 6. It seems like #4 can be avoided by using MapTopicPartition, Long or MapTopicPartition, TopicPartitionOffset as the argument type. How? lastCommittedOffsets() is independent of positions(). I'm not sure I understood your suggestion. I think of subscription as you're subscribing

Re: New Consumer API discussion

2014-02-11 Thread Pradeep Gollakota
do you think? -Jay On Mon, Feb 10, 2014 at 3:37 PM, Pradeep Gollakota pradeep...@gmail.com wrote: WRT to hierarchical topics, I'm referring to KAFKA-1175https://issues.apache.org/jira/browse/KAFKA-1175. I would just like to think through the implications for the Consumer API

Re: Building a producer/consumer supporting exactly-once messaging

2014-02-10 Thread Pradeep Gollakota
Have you read this part of the documentation? http://kafka.apache.org/documentation.html#semantics Just wondering if that solves your use case. On Mon, Feb 10, 2014 at 9:11 AM, Garry Turkington g.turking...@improvedigital.com wrote: Hi, I've been doing some prototyping on Kafka for a few

Re: New Consumer API discussion

2014-02-10 Thread Pradeep Gollakota
Couple of very quick thoughts. 1. +1 about renaming commit(...) and commitAsync(...) 2. I'd also like to extend the above for the poll() method as well. poll() and pollWithTimeout(long, TimeUnit)? 3. Have you guys given any thought around how this API would be used with hierarchical topics? 4.

Re: Config for new clients (and server)

2014-02-10 Thread Pradeep Gollakota
+1 Jun. On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian srsubraman...@linkedin.com wrote: +1 on Jun's suggestion. On 2/10/14 2:01 PM, Jun Rao jun...@gmail.com wrote: I actually prefer to see those at INFO level. The reason is that the config system in an application can be complex.

Re: New Consumer API discussion

2014-02-10 Thread Pradeep Gollakota
uniquely identifies a partition of a topic Thanks, Neha On Mon, Feb 10, 2014 at 12:36 PM, Pradeep Gollakota pradeep...@gmail.com wrote: Couple of very quick thoughts. 1. +1 about renaming commit(...) and commitAsync(...) 2. I'd also like to extend the above for the poll() method as well