Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Josh, NoSQL DBs may actually be easier because they themselves generally don't have a global order. I.e. I believe Mongo has a per-partition oplog, is that right? Their partitions would match our partitions. -Jay On Wed, Mar 4, 2015 at 5:18 AM, Josh Rader jrader...@gmail.com wrote: Thanks

Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Xiao, 1. Nothing prevents applying transactions transactionally on the destination side, though that is obviously more work. But I think the key point here is that much of the time the replication is not Oracle=Oracle, but Oracle={W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems

Re: Database Replication Question

2015-03-04 Thread Xiao
Hey Jay, Yeah. I understood the advantage of Kafka is one to many. That is why I am reading the source codes of Kafka. Your guys did a good product! : ) Our major concern is its message persistency. Zero data loss is a must in our applications. Below is what I copied from the Kafka document.

New Errors in 0.8.2 Protocol

2015-03-04 Thread Evan Huus
Hey all, it seems that 0.8.2 has added a handful more errors to the protocol which are not yet reflected on the wiki page [1]. Specifically, [2] seems to indicate that codes 17-20 now have associated meanings. My questions are: - Which of these are exposed publicly? (for example, the existing

Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Xiao, Yeah I agree that without fsync you will not get durability in the case of a power outage or other correlated failure, and likewise without replication you won't get durability in the case of disk failure. If each batch is fsync'd it will definitely be slower, depending on the

Re: Database Replication Question

2015-03-04 Thread Jonathan Hodges
Yes you are right on the oplog per partition as well as that mapping well to the Kafka partitions. I think we are making this harder than it is based on previous attempts and trying to leverage something like Databus for propagating log changes from MongoDB and Cassandra since it requires a scn.

Problem deleting topics in 0.8.2?

2015-03-04 Thread Jeff Schroeder
So I've got 3 kafka brokers that were started with delete.topic.enable set to true. When they start, I can see in the logs that the property was successfully set. The dataset in each broker is only approximately 2G (per du). When running kafaka-delete.sh with the correct arguments to delete all of

Re: Problem deleting topics in 0.8.2?

2015-03-04 Thread Harsha
Hi Jeff, Are you seeing any errors in state-change.log or controller.log after issuing kafka-topics.sh --delete command. There is another known issue is if you have auto.topic.enable.create = true (this is true by default) your consumer or producer can re-create the topic. So try

Re: New Errors in 0.8.2 Protocol

2015-03-04 Thread Evan Huus
Thanks Joe, keeping documentation in sync with KIPs does seem like a reasonable process going forward. And I apologize for the confrontational tone I used to end my original email, that was not called for. In the mean time, where can I find the answers to my two actual questions? I think I've

Re: Problem deleting topics in 0.8.2?

2015-03-04 Thread Timothy Chen
Hi Jeff, The controller should have a Topic deletion thread running coordinating the delete in the cluster, and the progress should be logged to the controller log. Can you look at the controller log to see what's going on? Tim On Wed, Mar 4, 2015 at 10:28 AM, Jeff Schroeder

Re: New Errors in 0.8.2 Protocol

2015-03-04 Thread Joe Stein
Hey Evan, moving forward (so 0.8.3.0 and beyond) the release documentation is going to match up more with specific KIP changes https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals which elaborated on things like breaking changes and major modifications you should adopt

high level consumer rollback

2015-03-04 Thread Luiz Geovani Vier
Hello, I'm using the high level consumer with auto-commit disabled and a single thread per consumer, in order to consume messages in batches. In case of failures on the database, I'd like to stop processing, rollback and restart from the last commited offset. Is there a way to receive the

Re: Camus Issue about Output File EOF Issue

2015-03-04 Thread Bhavesh Mistry
Hi Gwen, The root cause of all io related problems seems to be file rename that Camus does and underlying Hadoop MapR FS. We are copying files from user volume to a day volume (rename does copy) when mapper commits file to FS. Please refer to

Re: high level consumer rollback

2015-03-04 Thread Mayuresh Gharat
As per my knowledge, I don't think we you can do that with an online stream. You will have to reset the offsets to a particular offset in the past to start consuming from that. Another way would be start a separate consumer with different groupId. In any case you cannot consume from past offset

NodeJS Consumer library for 0.8.2

2015-03-04 Thread Julio Castillo
Looking around the nom repo, it looks like there is no current support for 0.8.2. Is the only alternative to use REST/Proxy? Thanks Julio Castillo NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Joel Koshy
I think the camus mailing list would be more suitable for this question. Thanks, Joel On Wed, Mar 04, 2015 at 11:00:51AM -0500, max square wrote: Hi all, I have browsed through different conversations around Camus, and bring this as a kinda Kafka question. I know is not the most orthodox,

Re: moving replications

2015-03-04 Thread Joel Koshy
I think what you may be looking for is being discussed here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing On Wed, Mar 04, 2015 at 12:34:30PM +0530, sunil kalva wrote: Is there any way to automate On Mar 3, 2015 11:57 AM, sunil kalva

Re: high level consumer rollback

2015-03-04 Thread Joel Koshy
This is not possible with the current high-level consumer without a restart, but the new consumer (under development) does have support for this. On Wed, Mar 04, 2015 at 03:04:57PM -0500, Luiz Geovani Vier wrote: Hello, I'm using the high level consumer with auto-commit disabled and a single

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Jagat Singh
Also see the related tool http://confluent.io/downloads/ Confluent is bringing the glue together for Kafta , Avro , Camus Though there is no clarity around support (e.g update of Kafta) around it at this moment. On Thu, Mar 5, 2015 at 8:57 AM, Joel Koshy jjkosh...@gmail.com wrote: I think

Re: Camus reads from multiple offsets in parallel?

2015-03-04 Thread Yang
Thanks for that info Jun. On Tue, Mar 3, 2015 at 3:56 PM, Jun Rao j...@confluent.io wrote: Camus only fetches from different partitions in parallel. Thanks, Jun On Fri, Feb 27, 2015 at 4:24 PM, Yang tedd...@gmail.com wrote: we have a single partition, and the topic contains 300k

Re: high level consumer rollback

2015-03-04 Thread Luiz Geovani Vier
Thanks, Mayuresh and Joel. Reconnecting works just fine, although it's much more complex than just calling rollback(), so I'm looking forward to the new version :) -Geovani On Wed, Mar 4, 2015 at 4:57 PM, Joel Koshy jjkosh...@gmail.com wrote: This is not possible with the current high-level

Re: Topicmetadata response miss some partitions information sometimes

2015-03-04 Thread Mayuresh Gharat
Cool. So then this is a non issue then. To make things better we can expose the availablePartitons() api through Kafka producer. What do you think? Thanks, Mayuresh On Tue, Mar 3, 2015 at 4:56 PM, Guozhang Wang wangg...@gmail.com wrote: Hey Jun, You are right. Previously I thought only in

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Lakshmanan Muthuraman
I think the libjars is not required. Maven package command for the camus project, builds the uber jar(fat jar) which contains all the dependencies in it. I generally run camus the following way. hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P

Re: Database Replication Question

2015-03-04 Thread Jonathan Hodges
Thanks James. This is really helpful. Another extreme edge case might be that the single producer is sending the database log changes and the network causes them to reach Kafka out of order. How do you prevent something like this, I guess relying on the scn on the consumer side? On Wed, Mar

Re: Database Replication Question

2015-03-04 Thread James Cheng
Another thing to think about is delivery guarantees. Exactly once, at least once, etc. If you have a publisher that consumes from the database log and pushes out to Kafka, and then the publisher crashes, what happens when it starts back up? Depending on how you keep track of the database's

RE: Trying to get kafka data to Hadoop

2015-03-04 Thread Thunder Stumpges
What branch of camus are you using? We have our own fork that we updated the camus dependency from the avro snapshot of the REST Schema Repository to the new official one you mention in github.com/schema-repo. I was not aware of a branch on the main linked-in camus repo that has this. That

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Otis Gospodnetic
Hello hello, Results of the poll are here! Any guesses before looking? What % of Kafka users are on 0.8.2.x already? What % of people are still on 0.7.x? http://blog.sematext.com/2015/03/04/poll-results-kafka-version-distribution/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Otis Gospodnetic
Hi, You can see the number of voters in the poll itself (view poll results link in the poll widget). Audience details unknown, but the poll was posted on: * twitter - https://twitter.com/sematext/status/57050147435776 * LinkedIn - a few groups - Kafka, DevOps, and I think another larger one *

Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Colin
Hello, We use docker for kafka on vm's with both nas and local disk. We mount the volumes externally. We havent had many problems at all, and a restart has cleared any issue. We are on .8.1 We are also started to deploy to aws. -- Colin +1 612 859 6129 Skype colin.p.clark On Mar 4,

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Christian Csar
Do you have a anything on the number of voters, or audience breakdown? Christian On Wed, Mar 4, 2015 at 8:08 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hello hello, Results of the poll are here! Any guesses before looking? What % of Kafka users are on 0.8.2.x already? What %

Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-04 Thread Neha Narkhede
+1. Verified quick start, unit tests. On Tue, Mar 3, 2015 at 12:09 PM, Joe Stein joe.st...@stealth.ly wrote: Ok, lets fix the transient test failure on trunk agreed not a blocker. +1 quick start passed, verified artifacts, updates in scala

Re: Best way to show lag?

2015-03-04 Thread Otis Gospodnetic
Hi, On Sat, Feb 28, 2015 at 9:16 AM, Gene Robichaux gene.robich...@match.com wrote: What is the best way to detect consumer lag? We are running each consumer as a separate group and I am running the ConsumerOffsetChecker to assess the partitions and the lag for each group/consumer. I run

Re: Explicit control over flushing the messages

2015-03-04 Thread Ponmani Rayar
Thanks a lot Jeff for redirecting me to the right place.. :-) Is there any tentative date when we can get the official release with this patch. On 4 March 2015 at 19:42, Jeff Holoman jholo...@cloudera.com wrote: Take a look here: https://issues.apache.org/jira/browse/KAFKA-1865 On

Mirror maker end to end latency metric

2015-03-04 Thread tao xiao
Hi team, Is there a built-in metric that can measure the end to end latency in MM? -- Regards, Tao

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread max square
Thunder, thanks for your reply. The hadoop job is now correctly configured (the client was not getting the correct jars), however I am getting Avro formatting exceptions due to the format the schema-repo server follows. I think I will do something similar and create our own branch that uses the

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Roger Hoover
Seeing around 5k msgs/s. The messages are small (average 42 bytes after snappy compression) On Wed, Mar 4, 2015 at 11:34 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi Roger, I have already enabled the snappy, the throughput which I have mentioned is after only. Could you mention

Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Otis Gospodnetic
Hi, On Fri, Feb 27, 2015 at 1:36 AM, James Cheng jch...@tivo.com wrote: Hi, I know that Netflix might be talking about Kafka on AWS at the March meetup, but I wanted to bring up the topic anyway. I'm sure that some people are running Kafka in AWS. I'd say most, not some :) Is anyone

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Vineet Mishra
Hi Roger, I have already enabled the snappy, the throughput which I have mentioned is after only. Could you mention what's the throughput you have reaching. Thanks! On Thu, Mar 5, 2015 at 12:56 PM, Roger Hoover roger.hoo...@gmail.com wrote: Hi Vineet, Try enabling compression. That

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Neha Narkhede
Thanks for running the poll and sharing the results! On Wed, Mar 4, 2015 at 8:34 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You can see the number of voters in the poll itself (view poll results link in the poll widget). Audience details unknown, but the poll was posted on:

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Neha Narkhede
Thanks Jagat for the callout! Confluent Platform 1.0 http://confluent.io/product/ includes Camus and we were happy to address any questions in our community mailing list confluent-platf...@googlegroups.com. On Wed, Mar 4, 2015 at 8:41 PM, max square max2subscr...@gmail.com wrote: Thunder,

please subscribe

2015-03-04 Thread Michael Minar
thank you

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Roger Hoover
Hi Vineet, Try enabling compression. That improves throughput 3-4x usually for me. Also, you can use async mode if you're willing to trade some chance of dropping messages for more throughput. kafka { codec = 'json' broker_list = localhost:9092 topic_id = blah

Re: Database Replication Question

2015-03-04 Thread James Cheng
On Mar 3, 2015, at 4:18 PM, Guozhang Wang wangg...@gmail.com wrote: Additionally to Jay's recommendation, you also need to have some special cares in error handling of the producer in order to preserve ordering since producer uses batching and async sending. That is, if you already sent

Explicit control over flushing the messages

2015-03-04 Thread Ponmani Rayar
Hi Group, I have started using Kafka 0.8.2 with the new producer API. Just wanted to know if we can have an explicit control over flushing the messages batch to Kafka cluster. Configuring batch.size will flush the messages when the batch.size is reached for a partition. But is there

RE: Kafka producer failed to send but actually does

2015-03-04 Thread Arunkumar Srambikkal (asrambik)
Thanks for responding. I was creating an instance of kafka.server.KafkaServer in my code for running some tests and this was what I referred to by an embedded broker. The scenario you described was what was happening. In my case when I kill my broker, it fails to send an ack. I added

JSON parsing causing rebalance to fail

2015-03-04 Thread Arunkumar Srambikkal (asrambik)
Hi, When I start a new consumer, it throws a Rebalance exception. However I hit it only on some machines where the run time libraries are different The stack given below is what I encounter - is this a known issue? I saw this Jira but it's not resolved so thought to confirm -

Re: reassign a topic partition which has no ISR and leader set to -1

2015-03-04 Thread todd
When we ran in to this problem we ended up going in to zookeeper and changing the leader to point to one of the replicas, then did a force leader election. This got the partition back online.   Original Message   From: Virendra Pratap Singh Sent: Wednesday, March 4, 2015 2:00 AM To: Gwen

Kafka web console error

2015-03-04 Thread Bhuvana Baskar
Hi, Using kafka-Web-Console: when i run the command play start, it works fine. I tried to register the zookeeper, but getting the below error. *java.nio.channels.ClosedChannelException* at

Re: Database Replication Question

2015-03-04 Thread Josh Rader
Thanks everyone for your responses! These are great. It seems our cases matches closest to Jay's recommendations. The one part that sounds a little tricky is point #5 'Include in each message the database's transaction id, scn, or other identifier '. This is pretty straightforward with the

Re: Got negative offset lag after restarting brokers

2015-03-04 Thread tao xiao
Thanks guy. with unclean.leader.election.enable set to false the issue is fixed On Tue, Mar 3, 2015 at 2:50 PM, Gwen Shapira gshap...@cloudera.com wrote: of course :) unclean.leader.election.enable On Mon, Mar 2, 2015 at 9:10 PM, tao xiao xiaotao...@gmail.com wrote: How do I achieve point

Re: Explicit control over flushing the messages

2015-03-04 Thread Jeff Holoman
Take a look here: https://issues.apache.org/jira/browse/KAFKA-1865 On Wed, Mar 4, 2015 at 4:28 AM, Ponmani Rayar ymmu...@gmail.com wrote: Hi Group, I have started using Kafka 0.8.2 with the new producer API. Just wanted to know if we can have an explicit control over flushing

Re: Database Replication Question

2015-03-04 Thread Xiao
Hi, Josh, That depends on how you implemented it. Basically, Kafka can provide a good throughput only when you have multiple partitions. - If you have multiple consumers and multiple partitions, each of which has a dedicated partition. That means, you need a coordinator to ensure all the