Re: Kafka Monitoring

2017-06-20 Thread David Garcia
If you’re using confluent, you can use the control center. It’s not free however. From: Muhammad Arshad Reply-To: "users@kafka.apache.org" Date: Monday, June 19, 2017 at 5:52 PM To: "users@kafka.apache.org" Subject: Kafka Monitoring Hi, wanted to see if there is Kafka monitoring which is av

Re: Hello, Help!

2017-07-07 Thread David Garcia
“…events so timely that the bearing upon of which is not immediately apparent and are hidden from cognitive regard; the same so tardy, they herald apropos” On 7/7/17, 12:06 PM, "Marcelo Vinicius" wrote: Hello, my name is Marcelo, and I am from Brazil. I'm doing a search on Kafka. I woul

Re: Kafka Connect Embedded API

2017-07-12 Thread David Garcia
I would just look at an example: https://github.com/confluentinc/kafka-connect-jdbc https://github.com/confluentinc/kafka-connect-hdfs On 7/12/17, 8:27 AM, "Debasish Ghosh" wrote: Hi - I would like to use the embedded API of Kafka Connect as per https://cwiki.apache.org/con

Re: DAG processing in Kafka Streams

2017-07-17 Thread David Garcia
I think he means something like Akka Streams: http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html Directed Acyclic Graphs are trivial to construct in Akka Streams and use back-pressure to preclude memory issues. -David On 7/17/17, 12:20 PM, "Guozhang Wang" wrote: Sameer,

Re: DAG processing in Kafka Streams

2017-07-17 Thread David Garcia
/home.html -David On 7/17/17, 12:35 PM, "David Garcia" wrote: I think he means something like Akka Streams: http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html Directed Acyclic Graphs are trivial to construct in Akka Streams and use back-pressure to preclude mem

Re: Shooting for microsecond latency between a Kafka producer and a Kafka consumer

2017-08-07 Thread David Garcia
You are not going to get that kind of latency (i.e. less than 100 microseconds). In my experience, consumer->producer latency averages around: 20 milliseconds (cluster is in AWS with enhanced networking). On 8/3/17, 2:32 PM, "Chao Wang" wrote: Hi, I observed that it took 2-6 mill

Re: Time based data retrieval from Kafka Topic

2017-09-05 Thread David Garcia
Make a topic and then set the retention to 1 hour. Every 15 minutes, start a consumer that always reads from the beginning. -David On 9/5/17, 9:28 AM, "Tauzell, Dave" wrote: What are you going to do with the messages every 15 minutes? One way I can think of is to have two consume

Re: Identifying Number of Kafka Consumers

2017-09-12 Thread David Garcia
Consumers can be split up based on partitions. So, you can tell a consumer group to listen to several topics and it will divvy up the work. Your use case sounds very canonical. I would take a look at Kafka connect (if you’re using the confluent stack). -Daivd http://docs.confluent.io/curren

Re: To Know about Kafka and it's usage

2017-09-28 Thread David Garcia
If you’re on the AWS bandwagon, you can use Kinesis-Analytics (https://aws.amazon.com/kinesis/analytics/). It’s very easy to use. Kafka is a bit more flexible, but you have to instrument maintain it. You may also want to look at Druid: http://druid.io/ There are some dashboards that you can

Re: Using Kafka on DC/OS + Marathon

2017-10-02 Thread David Garcia
I’m not sure how your requirements of Kafka are related to your requirements for marathon. Kafka is a streaming-log system and marathon is a scheduler. Mesos, as your resource manager, simply “manages” resources. Are you asking about multitenancy? If so, I highly recommend that you separate

Re: Kafka Monitoring..

2017-11-09 Thread David Garcia
JMX was pretty easy to setup for us. Look up the various jmx beans locally for your particular version of kafka (i.e. with jconsole..etc) https://github.com/jmxtrans/jmxtrans On 11/8/17, 7:10 PM, "chidigam ." wrote: Hi All, What is the simplest way of monitoring the metrics in kaka br

Re: Kafka Monitoring..

2017-11-09 Thread David Garcia
<https://github.com/prometheus/jmx_exporter> to get Kafka metrics into prometheus. Here’s our JMX Exporter config: https://github.com/wikimedia/puppet/blob/production/modules/profile/files/kafka/broker_prometheus_jmx_exporter.yaml On Thu, Nov 9, 2017 at 11:42

Null Output topic for Kafka Streams

2016-07-18 Thread David Garcia
I would like to process messages from an input topic, but I don’t want to send messages downstream…with KStreams. (i.e. I would like to read from a topic, do some processing including occasional database updates, and that’s it…no output to a topic). I could fake this by filtering out all my me

Kafka Streams Dynamic Topic consumer

2016-07-18 Thread David Garcia
Is there any way to specify a dynamic topic list (e.g. like a regex whitelist filter…like in the consumer API) with kafka streams? We would like to get the benefit of automatic checkpointing and load balancing if possible. -David

Re: Kafka Consumer stops consuming from a topic

2016-07-19 Thread David Garcia
Is it possible that your app is thrashing (i.e. FullGC’ing too much and not processing messages)? -David On 7/19/16, 9:16 AM, "Abhinav Solan" wrote: Hi Everyone, can anyone help me on this Thanks, Abhinav On Mon, Jul 18, 2016, 6:19 PM Abhinav Solan wrote: >

Re: Kafka Consumer stops consuming from a topic

2016-07-19 Thread David Garcia
n might be Kafka Server has a backup option and it self heals from this behavior ... Just a theory On Tue, Jul 19, 2016, 7:57 AM Abhinav Solan wrote: > No, was monitoring the app at that time .. it was just sitting idle > > On Tue, Jul 19, 2016, 7:32 AM

release of 0.10.1

2016-07-20 Thread David Garcia
Does anyone know when this release will be cut? -David

Re: Kafka Streams Latency

2016-07-22 Thread David Garcia
You should probably just put reporting in your app. Dropwizard, logs…etc. You can also look at Kafka JMX consumer metrics (assuming you don’t have too many consumers). -David On 7/22/16, 9:13 AM, "Adrienne Kole" wrote: Hi, How can I measure the latency and throughput in Kafka

Re: release of 0.10.1

2016-07-24 Thread David Garcia
s a "minor" release. Kafka is a bit odd in that its "major" releases are labeled as a normal "minor" release number because Kafka hasn't decided to make an official 1.0 release yet. What features/fixes are you looking for? -Ewen

Streaming Application Design with Database Lookups

2016-07-26 Thread David Garcia
Hello, we are working on designs for several streaming applications and a common consideration is the need for occasional external database updates/lookups. For example…we would be processing a stream of events with some kind of local-id, and we occasionally need to resolve the local-id to a g

Re: Kafka Streams multi-node

2016-07-26 Thread David Garcia
http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model you shouldn’t have to do anything. Simply starting a new thread will “rebalance” your streaming job. The job coordinates with tasks through kafka itself. On 7/26/16, 12:42 PM, "Davood Rafiei" wrote: Hi,

Re: Problems with replication and performance

2016-07-27 Thread David Garcia
Sounds like you might want to go the partition route: http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ If you lose a broker (and you went the topic route), the probability that an arbitrary topic was on the broker is higher than if you had gone the pa

Reactive Kafka performance

2016-07-28 Thread David Garcia
Our team is evaluating KStreams and Reactive Kafka (version 0.11-M4) on a confluent 3.0 cluster. Our testing is very simple (pulling from one topic, doing a simple transform) and then writing out to another topic. The performance for the two systems is night and day. Both applications were ru

Re: Cluster config

2016-07-28 Thread David Garcia
What is your replication for these topics? On 7/28/16, 3:03 PM, "Kessiler Rodrigues" wrote: Hey guys, I have > 5k topics with 5 partitions each in my cluster today. My actual cluster configuration is: 6 brokers - 16 vCPUs, 14.4 GB Nowadays, I’m having so

Re: Kafka 0.9.0.1 failing on new leader election

2016-07-29 Thread David Garcia
Well, just a dumb question, but did you include all the brokers in your client connection properties? On 7/29/16, 10:48 AM, "Sean Morris (semorris)" wrote: Anyone have any ideas? From: semorris mailto:semor...@cisco.com>> Date: Tuesday, July 26, 2016 at 9:40 AM To: "users@k

Re: Reactive Kafka performance

2016-07-30 Thread David Garcia
treams in order to see if KStreams can improve on that end. Would you like to elaborate a bit more on that end? Guozhang On Thu, Jul 28, 2016 at 12:16 PM, David Garcia wrote: > Our team is evaluating KStreams and Reactive Kafka (version 0.11-M4) on a &

KTable and Rebalance Operations

2016-08-02 Thread David Garcia
Hello, I’ve googled around for this, but haven’t had any luck. Based upon this: http://docs.confluent.io/3.0.0/streams/architecture.html#state KTables are local to instances. An instance will process one or more partitions from one or more topics. How does Kstreams/Ktables handle the followi

Re: Opening up Kafka JMX port for Kafka Consumer in Kafka Streams app

2016-08-02 Thread David Garcia
Have you looked at kafka manager: https://github.com/yahoo/kafka-manager It provides consumer level metrics. -David On 8/2/16, 12:36 PM, "Phillip Mann" wrote: Hello all, This is a bit of a convoluted title but we are trying to set up monitoring on our Kafka Cluster and Kafka Strea

Re: Multiple processors belongs to same GroupId needs to read same message from same topic

2016-08-16 Thread David Garcia
You could create another partition in topic T, and publish the same message to both partitions. You would have to configure P2 to read from the other partition. Or you could have P1 write the message to another topic and configure P2 to listen to that topic. -David On 8/16/16, 11:54 PM, "Dee

Kafka Connect JDBC on Production DB

2016-08-19 Thread David Garcia
My team is considering using either Kafka-connect JDBC or Bottled water to stream DB-changes from several production postgres DB’s. WRT bottled water, this is a little scary: https://github.com/confluentinc/bottledwater-pg/issues/96 But, the Kafka-connect option also seems like it could affect

Re: Questions about Apache Kafka

2016-08-24 Thread David Garcia
Regarding 3 and 4: https://calcite.apache.org/docs/stream.html (i.e. streaming SQL queries) On 8/24/16, 6:29 AM, "Herwerth Karin (CI/OSW3)" wrote: Dear Sir or Madam, I'm a beginner in Apache Kafka and have questions which I don't get it from the documentation. 1.

KStreams regex-topic null pointer Exception with multiple Consumer processes

2016-09-03 Thread David Garcia
This bug may be fixed after commit: 6fb33afff976e467bfa8e0b29eb82770a2a3aaec When you start two consumer processes with a regex topic (with 2 or more partitions for the matching topics), the second (i.e. nonleader) consumer will fail with a null pointer exception. Exception in thread "StreamThr

Re: KStreams regex-topic null pointer Exception with multiple Consumer processes

2016-09-06 Thread David Garcia
ok through the code I think it is indeed a bug, and commit 6fb33afff976e467bfa8e0b29eb82770a2a3aaec will not fix it IMHO. Would you want to create a JIRA for keeping track of this issue? Guozhang On Sat, Sep 3, 2016 at 10:16 AM, David Garcia wrote:

Re: monitor page cache read ratio

2016-09-07 Thread David Garcia
Cachestat https://www.datadoghq.com/blog/collecting-kafka-performance-metrics/ On 9/7/16, 8:31 AM, "Peter Sinoros Szabo" wrote: Hi, As I read more and more about kafka monitoring it seems that monitoring the linux page cache hit ration is important, but I do not really find a

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
The “simplest” way to solve this is to “repartition” your data (i.e. the streams you wish to join) with the partition key you wish to join on. This obviously introduces redundancy, but it will solve your problem. For example.. suppose you want to join topic T1 and topic T2…but they aren’t part

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
system? -David On 9/7/16, 3:41 PM, "David Garcia" wrote: The “simplest” way to solve this is to “repartition” your data (i.e. the streams you wish to join) with the partition key you wish to join on. This obviously introduces redundancy, but it will solve your problem. F

Re: What's the relationship between Kafka and Zookeeper ?

2016-09-10 Thread David Garcia
Data is always provided by the leader of a topic-partition (i.e. a broker). Here is a summary of how zookeeper is used: https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka -David On 9/10/16, 3:47 PM, "Eric Ho" wrote: I notice that some Spark programs would contact someth

Re: Slow machine disrupting the cluster

2016-09-16 Thread David Garcia
To remediate, you could start another broker, rebalance, and then shut down the busted broker. But, you really should put some monitoring on your system (to help diagnose the actual problem). Datadog has a pretty good set of articles for using jmx to do this: https://www.datadoghq.com/blog/mo

Re: Mirroring from 0.8.2.1 to 0.10.0.0

2016-09-26 Thread David Garcia
Try running mirror maker from the other direction (i.e. from 0.8.2.1 ). I had a similar issue, and that seemed to work. -David On 9/26/16, 5:19 PM, "Xavier Lange" wrote: I'm using bin/kafka-mirror-maker.sh for the first time and I need to take my "aws-cloudtrail" topic from a 0.8.2.1

Re: Kafka 10 Consumer Reading from Kafka 8 Cluster?

2016-10-06 Thread David Garcia
Any reason you can’t use mirror maker? https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 -David On 10/6/16, 1:32 PM, "Craig Swift" wrote: Hello, We're in the process of upgrading several of our clusters to Kafka 10. I was wondering if it's possible to us

puncutuate() never called

2016-10-07 Thread David Garcia
Hello, I’m sure this question has been asked many times. We have a test-cluster (confluent 3.0.0 release) of 3 aws m4.xlarges. We have an application that needs to use the punctuate() function to do some work on a regular interval. We are using the WallClock extractor. Unfortunately, the meth

Re: Same leader for all partitions for topics

2016-10-07 Thread David Garcia
Maybe someone already answered this…but you can use the repartitioner to fix that (it’s included with Kafka) As far as root cause, you probably had a few leader elections due to excessive latency. There is a cascading scenario that I noticed Kafka is vulnerable to. The events transpire as fol

Re: puncutuate() never called

2016-10-07 Thread David Garcia
uld this be the problem you're seeing? See also the related discussion at http://stackoverflow.com/questions/39535201/kafka-problems-with-timestampextractor . On Fri, Oct 7, 2016 at 6:07 PM, David Garcia wrote: > Hello, I’m sure t

puncutuate() bug

2016-10-07 Thread David Garcia
topics. Punctuate will never be called. -David On 10/7/16, 1:11 PM, "David Garcia" wrote: Yeah, this is possible. We have run the application (and have confirmed data is being received) for over 30 mins…with a 60-second timer. So, do we need to just rebuild our cluster w

Re: puncutuate() bug

2016-10-08 Thread David Garcia
Actually, I think the bug is more subtle. What happens when a consumed topic stops receiving messages? The smallest timestamp will always be the static timestamp of this topic. -David On 10/7/16, 5:03 PM, "David Garcia" wrote: Ok I found the bug. Basically, if there is an e

Re: puncutuate() never called

2016-10-10 Thread David Garcia
of this topic. -David On 10/7/16, 5:03 PM, "David Garcia" wrote: Ok I found the bug. Basically, if there is an empty topic (in the list of topics being consumed), any partition-group with partitions from the topic will always return -1 as the smallest timestamp (see Partition

broker upgrade

2016-10-11 Thread David Garcia
Hello, we are going to be upgrading the instance types of our brokers. We will shut them down, upgrade, and the restart them. All told, they will be down for about 15 minutes. Upon restart, is there anything we need to do other than run preferred leader election? The brokers will start to ca

Re: Doubt on max.request.size

2016-10-14 Thread David Garcia
Hello Daniccan. I apologize for the dumb question, but did you also check “message.max.bytes” on the broker? Default is about 1meg (112 bytes) for kafka 0.10.0. if you need to publish larger messages, you will need to adjust that on the brokers and then restart them. -David On 10/14/16,

Re: Doubt on max.request.size

2016-10-14 Thread David Garcia
WRT performance, yes, changing message size will affect the performance of producers and consumers. Please study the following to understand the relationship between message size and performance (graphs at the bottom visualize the relationship nicely): https://engineering.linkedin.com/kafka/ben

Re: Query on kafka queue message purge.

2016-10-14 Thread David Garcia
Using the kafka-topics.sh script, simply set the retention in a way to remove the message: Kafka-topics.sh –zookeeper --alter –config retention.ms= --topic This is actually deprecated, but still works in newer kafka 0.10.0. Note: cleanup=delete is required for this. This policy will only e

Please help with AWS configuration

2016-10-19 Thread David Garcia
Hello everyone. I’m having a hell of a time figuring out a Kafka performance issue in AWS. Any help is greatly appreciated! Here is our AWS configuration: - Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default configuration) EBS volumes (sd1) - Kafka Cluster (0.10.0):

Re: Please help with AWS configuration

2016-10-19 Thread David Garcia
Sorry, had a typo in my gist. Here is the correct location: https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440 On 10/19/16, 4:33 PM, "David Garcia" wrote: Hello everyone. I’m having a hell of a time figuring out a Kafka performance issue in AWS. Any help

Invalid IP addresses sent from Consumers

2019-12-05 Thread David Garcia
Hello, my consumers are reporting invalid IP address. When running kafka-consumer-groups –describe… I see the following: TOPICPARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID Topic1 12 1319627

Static Membership AND Invalid IP addresses forConsumers

2019-12-06 Thread David Garcia
ip-address in the host-field. Is there a way to make it replace the host-info AND use static membership? David Garcia Staff Big Data Engineer david.gar...@bazaarvoice.com O: M: 512.576.5864 Site <https://www.bazaarvoice.com/> | Blog <https://www.bazaarvoice.com/blog> | Tw

Corrupt Kafka file

2019-12-19 Thread David Garcia
Hello, we are getting the following error: server.log:[2019-12-17 15:05:28,757] ERROR [ReplicaManager broker=5] Error processing fetch with max size 1048576 from consumer on partition my-topic-2: (fetchOffset=312239, logStartOffset=-1, maxBytes=1048576, currentLeaderEpoch=Optional.empty) (kafka

Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-23 Thread David Garcia
If you are consuming from more than one topic/partition, punctuate is triggered when the “smallest” time-value changes. So, if there is a partition that doesn’t have any more messages on it, it will always have the smallest time-value and that time value won’t change…hence punctuate never gets

Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-26 Thread David Garcia
ocess1") >.addStateStore(profiles, "deltaProcess2", "deltaProcess1") >.addStateStore(company_bucket, "deltaProcess2", "deltaProcess1"); > > KafkaStreams streams = new KafkaStreams(builder, config); > > streams.setUncaug

Re: Interactive Queries

2016-11-26 Thread David Garcia
I would start here: http://docs.confluent.io/3.1.0/streams/index.html On 11/26/16, 8:27 PM, "Alan Kash" wrote: Hi, New to Kafka land. I am looking into Interactive queries feature, which transforms Topics into Tables with history, neat ! 1. What kind of querie

Re: HOW TO GET KAFKA CURRENT TOPIC GROUP OFFSET

2016-12-02 Thread David Garcia
https://kafka.apache.org/documentation#operations It’s in there somewhere… ;-) On 11/28/16, 7:34 PM, "西风瘦" wrote: HI! WAIT YOUR ANSWER

Re: min hardware requirement

2017-01-24 Thread David Garcia
This should give you an idea: https://www.confluent.io/blog/design-and-deployment-considerations-for-deploying-apache-kafka-on-aws/ On 1/23/17, 10:25 PM, "Ewen Cheslack-Postava" wrote: Smaller servers/instances work fine for tests, as long as the workload is scaled down as well. Most me

Re: min hardware requirement

2017-01-24 Thread David Garcia
Sorry, wrong link: http://docs.confluent.io/2.0.1/kafka/deployment.html On 1/24/17, 2:13 PM, "David Garcia" wrote: This should give you an idea: https://www.confluent.io/blog/design-and-deployment-considerations-for-deploying-apache-kafka-on-aws/ On 1/23/17, 10:25

Re: How to measure the load capacity of kafka cluster

2017-02-09 Thread David Garcia
From my experience, the bottle neck of a kafka cluster is the writing. (i.e. the producers) The most direct way to measure how “stressed” the writing threads are is to directly observe the producer purgatory buffer of your brokers. The larger it gets the more likely a leader will report an ou

TimeBasePartitioner for confluent

2017-03-03 Thread David Garcia
Trying to user s3-loader and am getting this error: org.apache.kafka.common.config.ConfigException: Invalid generator class: class io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator at io.confluent.connect.storage.partitioner.TimeBasedPartitioner.newSchemaGenerator(T

Re: TimeBasePartitioner for confluent

2017-03-03 Thread David Garcia
Gah…nm…looked at source code…use this: schema.generator.class=io.confluent.connect.storage.hive.schema.TimeBasedSchemaGenerator On 3/3/17, 5:36 PM, "David Garcia" wrote: Trying to user s3-loader and am getting this error: org.apache.kafka.common.config.ConfigExceptio

Re: ISR churn

2017-03-22 Thread David Garcia
Look at producer purgatory size. Anything greater than 10 is bad (from my experience). Keeping it under 4 seemed to help us. (i.e. if a broker is getting slammed with write, use rebalance tools or add a new broker). Also check network latency and/or adjust timeout for ISR checking. If on AW

Re: ISR churn

2017-03-22 Thread David Garcia
Thanks, Jun > On Mar 22, 2017, at 3:07 PM, David Garcia wrote: > > producer purgatory size

Re: Benchmarking streaming frameworks

2017-03-23 Thread David Garcia
I don’t think “benchmarking” frameworks WRT Kafka is a particularly informative. The various frameworks available are better compared WRT their features and processing limitations. For example, Akka-streams for kafka effects a more intuitive way to express asynchronous operations. If you were

Re: Kafka on windows

2017-04-11 Thread David Garcia
One issue is that Kafka leverage some very specific features of the linux kernel that are probably far different from Windows, so I imagine the performance profile is likewise much different. On 4/11/17, 8:52 AM, "Tomasz Rojek" wrote: Hi All, We want to choose provider of messagin

Re: Kafka on windows

2017-04-14 Thread David Garcia
On Tue, Apr 11, 2017 at 9:41 PM, David Garcia wrote: > One issue is that Kafka leverage some very specific features of the linux > kernel that are probably far different from Windows, so I imagine the > performance profile is likewise much different. > > On

Re: possible kafka bug, maybe in console producer/consumer utils

2017-04-18 Thread David Garcia
Kafka is very reliable when the broker actually gets the message and replies back to the producer that it got the message (i.e. it won’t “lie”). Basically, your producer tried to put too many bananas into the Bananer’s basket. And yes, Windows is not supported. You will get much better perfor

Re: possible kafka bug, maybe in console producer/consumer utils

2017-04-18 Thread David Garcia
The console producer in the 0.10.0.0 release uses the old producer which doesn’t have “backoff”…it’s really just for testing simple producing: object ConsoleProducer { def main(args: Array[String]) { try { val config = new ProducerConfig(args) val reader = Class.forName(c

Re: possible kafka bug, maybe in console producer/consumer utils

2017-04-18 Thread David Garcia
The “NewShinyProducer” is also deprecated. On 4/18/17, 5:41 PM, "David Garcia" wrote: The console producer in the 0.10.0.0 release uses the old producer which doesn’t have “backoff”…it’s really just for testing simple producing: object ConsoleProducer { def

Re: Kafka Producer - Multiple broker - Data sent to buffer but not in Queue

2017-04-18 Thread David Garcia
What do broker logs say around the time you send your messages? On 4/18/17, 3:21 AM, "Ranjith Anbazhakan" wrote: Hi, I have been testing behavior of multiple broker instances of kafka in same machine and facing inconsistent behavior of producer sent records to buffer not being av

Re: Streaming to cloud platform

2017-04-20 Thread David Garcia
You can use kafka connect and something like bottled water if you want to be fancy. On 4/20/17, 8:46 AM, "arkaprova.s...@cognizant.com" wrote: Hi, I would like to ingest data from RDBMS to CLOUD platform like Azure HDInsight BLOB using Kafka . What will be the best practice in te

Re: the problems of parititons assigned to consumers

2017-04-25 Thread David Garcia
For Problem 1, you will probably have to either use the low-level API, and/or do manual partition assignment. For problem 2, you can simply re-publish the messages to a new topic with more partitions…or, as in the first problem, just use the low level API. You can also create more consumer gro

Re: Kafka Stream vs Spark

2017-04-27 Thread David Garcia
Unlike spark, you don’t need an entire framework to deploy your job. With Kstreams, you just start up an application and go. You don’t need docker either…although containerizing your stuff is probably a good strategy for the purposes of deployment management (something you get with Yarn or a s

Re: Loss of Messages

2017-05-24 Thread David Garcia
What is your in-sync timeout set to? -David On 5/24/17, 5:57 AM, "Vinayak Sharma" wrote: Hi, I am running Kafka as a 2 node cluster. When I am scaling up and down 1 kafka broker I am experiencing loss of messages at consumer end during reassignment of partitions. Do you