Re: Versioning Schema's

2013-06-13 Thread Shone Sadler
Thanks Jun Phil! Shone On Thu, Jun 13, 2013 at 12:00 AM, Jun Rao jun...@gmail.com wrote: Yes, we just have customized encoder that encodes the first 4 bytes of md5 of the schema, followed by Avro bytes. Thanks, Jun On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler shone.sad...@gmail.com

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
Hi Jun, I was using the 0.8 branch with 2 commits behind but now I am using the latest with the same issue. 3 topics A,B,C, created automatically with replication factor of 2 and partitions 2. 2 brokers (0 and 1). List of topics in zookeeper is the following: topic: A partition: 0leader: 1

0.8 Durability Question

2013-06-13 Thread Jonathan Hodges
Looking at Jun’s ApacheCon slides ( http://www.slideshare.net/junrao/kafka-replication-apachecon2013) slide 21 titled, ‘Data Flow in Replication’ there are three possible durability configurations which tradeoff latency for greater persistence guarantees. The third row is the ‘no data loss’

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I've tried the console producer, so I will assume that's not related with the producer. I keep seeing the same entries in the producer from time to time: [2013-06-13 11:04:00,670] WARN Error while fetching metadata [{TopicMetadata for topic C - No partition metadata for topic C due to

Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-13 Thread Philip O'Toole
Jun - thanks again. This is very helpful. Philip On Jun 12, 2013, at 9:50 PM, Jun Rao jun...@gmail.com wrote: Actually, you are right. This can happen on a single topic too, if you have more than one consumer thread. Each consumer thread pulls data from a blocking queue, one or more

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I think I know what's happening: I tried to run both brokers and ZK on the same machine and it worked. I also attempted to do the same but with a ZK node on other machine and it also worked. My guess is something related with ports. All the machines are on EC2 and there might be something

Re: 0.8 Durability Question

2013-06-13 Thread Neha Narkhede
No. It only means that messages are written to all replicas in memory. Data is flushed to disk asynchronously. Thanks, Neha On Jun 13, 2013 3:29 AM, Jonathan Hodges hodg...@gmail.com wrote: Looking at Jun’s ApacheCon slides ( http://www.slideshare.net/junrao/kafka-replication-apachecon2013)

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Jun Rao
Have you looked at #3 in http://kafka.apache.org/faq.html? Thanks, Jun On Thu, Jun 13, 2013 at 6:41 AM, Alexandre Rodrigues alexan...@blismedia.com wrote: I think I know what's happening: I tried to run both brokers and ZK on the same machine and it worked. I also attempted to do the

Re: Arguments for Kafka over RabbitMQ ?

2013-06-13 Thread Alexis Richardson
Hi all, First, thanks to Tim (from Rabbit) and Jonathan for moving this thread along. Jonathan, I hope you found my links to the data model docs, and Tim's replies, helpful. Has everyone got what they wanted from this thread? alexis On Tue, Jun 11, 2013 at 5:49 PM, Jonathan Hodges

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I have but this is a different thing. It's related with ports and security groups and not with the bind addresses. It's solved now. Thanks On 13 June 2013 15:42, Jun Rao jun...@gmail.com wrote: Have you looked at #3 in http://kafka.apache.org/faq.html? Thanks, Jun On Thu, Jun 13, 2013

Using Kafka for data messages

2013-06-13 Thread Josh Foure
  Hi all, my team is proposing a novel way of using Kafka and I am hoping someone can help do a sanity check on this:   1.  When a user logs into our website, we will create a “logged in” event message in Kafka containing the user id.  2.  30+ systems (consumers each in their own consumer groups)

Re: Arguments for Kafka over RabbitMQ ?

2013-06-13 Thread Jonathan Hodges
Hi Alexis, This was very helpful and I also appreciate both yours and Tim's input here. It clears up the cases for when to use Rabbit or Kafka. What is great is they are both open source with vibrant communities behind them. -Jonathan Go On Jun 13, 2013 8:45 AM, Alexis Richardson

RE: shipping logs to s3 or other servers for backups

2013-06-13 Thread S Ahmed
Hi, In my application, I am storing user events, and I want to partition the storage by day. So at the end of a day, I want to take that file and ship it to s3 or another server as a backup. This way I can replay the events for a specific day if needed. These events also have to be in order.

Re: Using Kafka for data messages

2013-06-13 Thread Mahendra M
Hi Josh, The idea looks very interesting. I just had one doubt. 1. A user logs in. His login id is sent on a topic 2. Other systems (consumers on this topic) consumer this message and publish their results to another topic This will be happening without any particular order for hundreds of

Re: Producer only finding partition on 1 of 2 Brokers, even though ZK shows 1 partition exists on both Brokers?

2013-06-13 Thread Brett Hoerner
As an update, this continues to affect us. First I'd like to note ways in which my issues seems different than KAFKA-278, * I did not add a new broker or a new topic, this topic has been in use on two existing brokers for months * The topic definitely exists on both brokers. The topic/data

Re: Using Kafka for data messages

2013-06-13 Thread Josh Foure
Hi Mahendra, I think that is where it gets a little tricky.  I think it would work something like this: 1.  Web sends login event for user user123 to topic GUEST_EVENT. 2.  All of the systems consume those messages and publish the data messages to topic GUEST_DATA.user123. 3.  The

Re: Using Kafka for data messages

2013-06-13 Thread Timothy Chen
Also since you're going to be creating a topic per user, the number of concurrent users will also be a concern to Kafka as it doesn't like massive amounts of topics. Tim On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure user...@yahoo.com wrote: Hi Mahendra, I think that is where it gets a little

Re: Using Kafka for data messages

2013-06-13 Thread Josh Foure
Ah yes, I had read that Kafka likes under 1,000 topics but I wasn't sure if that was really a limitation.  In principle I wouldn't mind having all guest events placed on the GUEST_DATA queue but I thought that by having more topics I could minimize having consumers read messages only to discard

Re: Producer only finding partition on 1 of 2 Brokers, even though ZK shows 1 partition exists on both Brokers?

2013-06-13 Thread Brett Hoerner
You know what, it's likely this is all because I'm running a bad fork of Kafka 0.7.2 for Scala 2.10 (on the producers/consumers) since that's the version we've standardized on. Behavior in 2.9.2 with the official Kafka 0.7.2 release seems much more normal -- I'm working on downgrading all our

Re: Using Kafka for data messages

2013-06-13 Thread Taylor Gautier
Spot on. This one was of the areas that we had to workaround. Remember that there is a 1:1 relationship of topics to directories and most file systems don't like 10s of thousands of directories. We found on practice that 60k per machine was a practical limit using I believe EXT3FS On

Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced? It seems that the first ConsumerConnector to come up is prefetching some data, and we end up with duplicate

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Just to be clear, I'm not asking that we solve duplicate messages on crash before commit to Zookeeper, just an apparent problem where if Kafka has some data, and we start on ConsumerConnectors, we get dupe data since some Consumers come up before others. Any help? Philip On Thu, Jun 13, 2013 at

Re: Kafka 0.8 Maven and IntelliJ

2013-06-13 Thread Jun Rao
Thanks. Which version of Intellij are you using? Jun On Thu, Jun 13, 2013 at 10:20 AM, Dragos Manolescu dragos.manole...@servicenow.com wrote: Hmm, I've just pulled 0.8.0-beta1-candidate1, removed .idea* from my top-level directory, executed gen-idea, and then opened and built the project

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Jun Rao
Are you messages compressed in batches? If so, some dups are expected during rebalance. In 0.8, such dups are eliminated. Other than that, rebalance shouldn't cause dups since we commit consumed offsets to ZK before doing a rebalance. Thanks, Jun On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Jun -- thanks. We're using 0.72. No, the messages are not compressed, and since we do appear to be seeing dupes in our tests, it indicates our own code is buggy. Thanks, Philip On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao jun...@gmail.com wrote: Are you messages compressed in batches? If so, some