Re: Timestamps unique?

2022-01-13 Thread Svante Karlsson
No guarantee, /svante Den tors 13 jan. 2022 kl 20:21 skrev Chad Preisler : > > Hello, > > For ConsumerRecord.timestamp() is the timestamp guaranteed to be > unique within the topic's partition, or can there be records inside the > topics partition that have the same timestamp? > > Thanks. > Chad

Re: guidelines for replacing a lost Kafka Broker

2019-09-13 Thread Svante Karlsson
Just bring a new broker up and give it the id of the lost one. It will sync itself /svante Den fre 13 sep. 2019 kl 13:51 skrev saurav suman : > Hi, > > When the old data is lost and another broker is added to the cluster then > it is a new fresh broker with no data. You can reassign the partitio

Re: Same key found on 2 different partitions of compacted topic (kafka-streams)

2019-05-17 Thread Svante Karlsson
Yes that sound likely, if you changed the number of partitions then the hashing of the key's will change destination. You need to either clear the data (ie change retention to very small and roll the logs) or recreate the topic. /svante Den fre 17 maj 2019 kl 12:32 skrev Nitay Kufert : > I would

Re: Streaming Data

2019-04-09 Thread Svante Karlsson
I would stream to influxdb and visualize with grafana. Works great for dashboards. But I would rethink your line format. It's very convenient to have tags (or labels) that are key/value pair on each metric if you ever want to aggregate over a group of similar metrics. Svante

Re: Kafka Deplopyment Using Kubernetes (on Cloud) - settings for log.dirs

2018-10-22 Thread Svante Karlsson
Different directories, they cannot share path. A broker will delete everything under the log directory that it does not know about Den mån 22 okt. 2018 kl 17:47 skrev M. Manna : > Hello, > > We are thinking of rolling out Kafka on Kubernetes deployed on public cloud > (AWS or GCP, or other). We w

Re: Have connector be paused from start

2018-09-28 Thread Svante Karlsson
Sound like a workflow/pipeline thing in jenkins (or equivalent) to me. Den ons 26 sep. 2018 kl 17:27 skrev Rickard Cardell : > Hi > Is there a way to have a Kafka Connect connector begin in state 'PAUSED'? > I.e I would like to have the connector set to paused before it can process > any data f

Re: Low level kafka consumer API to KafkaStreams App.

2018-09-13 Thread Svante Karlsson
You are doing something wrong if you need 10k threads to produce 800k messages per second. It feels you are a factor of 1000 off. What size are your messages? On Thu, Sep 13, 2018, 21:04 Praveen wrote: > Hi there, > > I have a kafka application that uses kafka consumer low-level api to help > us

Re: Reliability against rack failure

2018-08-05 Thread Svante Karlsson
tion, but for this specific deployment adding a rack > is out of question. > Is there a way to resolve this with 2 racks ? > > Regards, > Sanjay > > On 05/08/18, 11:57 PM, "Svante Karlsson" wrote: > > >3 racks, Replication Factor = 3, min.insync.replicas=2,

Re: Reliability against rack failure

2018-08-05 Thread Svante Karlsson
3 racks, Replication Factor = 3, min.insync.replicas=2, ack=all 2018-08-05 20:21 GMT+02:00 Sanjay Awatramani : > Hi, > > I have done some experiments and gone through kafka documentation, which > makes me conclude that there is a small chance of data loss or availability > in a rack scenario. Ca

Re: log compaction v log rotation - best of the two worlds

2018-03-21 Thread Svante Karlsson
alt1) if you can store a generation counter in the value of the "latest value" topic you could do as follows topic latest_value key [id] topic full_history key[id, generation] on delete get the latest_value.generation_counter and issue deletes on full_history key[id, 0..generation_counter] alt2

Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
ing > message to our infrastructure side, but the webapp is unaware if it allowed > or not ... > > > > thank for your reply 😊 > > Adrien > > > De : Svante Karlsson > Envoyé : samedi 10 mars 2018 19:13:04 > À : users@kafka.apac

Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
You do not want to expose the kafka instance to your different clients. put some api endpoint between. rest/grpc or whatever. 2018-03-10 19:01 GMT+01:00 Nick Vasilyev : > Hard to say without more info, but why not just deploy something like a > REST api and expose it to your clients, they will se

Re: Consultant Help

2018-03-02 Thread Svante Karlsson
try https://www.confluent.io/ - that's what they do /svante 2018-03-02 21:21 GMT+01:00 Matt Stone : > We are looking for a consultant or contractor that can come onsite to our > Ogden, Utah location in the US, to help with a Kafka set up and maintenance > project. What we need is someone with t

Re: Hardware Guidance

2018-03-01 Thread Svante Karlsson
It's per broker. Usually you run with 4-6GB of java heap. The rest is used as disk cache and it's more that 64GB seems like a sweet spot between memory cost and performance. /svante 2018-03-01 18:30 GMT+01:00 Michal Michalski : > I'm quite sure it's per broker (it's a standard way to provide > r

Re: Regarding : Store stream for infinite time

2018-01-23 Thread Svante Karlsson
Yes, it will store the last value for each key 2018-01-23 18:30 GMT+01:00 Aman Rastogi : > Hi All, > > We have a use case to store stream for infinite time (given we have enough > storage). > > We are planning to solve this by Log Compaction. If each message key is > unique and Log compaction is

Re: Kafka Replication Factor

2018-01-17 Thread Svante Karlsson
whats your config for min.insync.replicas? 2018-01-17 13:37 GMT+01:00 Sameer Kumar : > Hi, > > I have a cluster of 3 Kafka brokers, and replication factor is 2. This > means I can tolerate failure of 1 node without data loss. > > Recently, one of my node crashed and some of my partitions went off

Re: one machine that have four network.....

2018-01-16 Thread Svante Karlsson
Even if you bind your socket to an ip of a specific card, when the packet is about to leave your host it hits the routing table and gets routed through the interface with least cost (arbitrary but static since all interfaces have same cost since they are on the same subnet) thus you will not reach

Re: Broker won't exit...

2018-01-10 Thread Svante Karlsson
if you really want all the brokers to die, try change server.properties controlled.shutdown.enable=false I had a similar problem on dev laptop with a single broker. It refused to die on system shutdowns (or took a very long time). 2018-01-10 12:57 GMT+01:00 Ted Yu : > Skip:Can you pastebin the

Re: Multiple brokers - do they share the load?

2017-11-28 Thread Svante Karlsson
You are connecting to a single seed node - your kafka library will then under the hood connect to the partition leaders for each partition you subscribe or post to. The load is not different compared to if you gave all nodes as connect parameter. However if your seed node crashes then your client

Re: Building news feed of social app using kafka

2017-11-01 Thread Svante Karlsson
Nope, that's the wrong design. It does not scale. You would end up in a wide and shallow thing. To few messages per partition to make sense. You want many thousands per partition per second to amortize the consumer to broker round-trip. On Nov 1, 2017 21:12, "Anshuman Ghosh" wrote: > Hello! > >

Re: Kafka Streams Avro SerDe version/id caching

2017-10-03 Thread Svante Karlsson
I've implemented the same logic for a c++ client - caching is the only way to go since the performance impact of not doing it would be to big. So bet on caching on all clients. 2017-10-03 18:12 GMT+02:00 Damian Guy : > If you are using the confluent schema registry then the will be cached by > th

Re: Is there a way in increase number of partitions

2017-08-21 Thread Svante Karlsson
Short answer - you cannot. The existing data is not reprocessed since kafka itself has no knowledge on how you did your partitioning. The normal workaround is that you stop producers and consumers. Create a new topic with the desired number of partitions. Consume the old topic from beginning and w

Re: Kafka rack-id and min in-sync replicas

2017-08-20 Thread Svante Karlsson
I think you are right, The rack awareness is used to spread the partitions on creation, assignment -etc so get as many racks as your replication count. /svante 2017-08-20 13:33 GMT+02:00 Carl Samuelson : > Hi > > I asked this question on SO here: > https://stackoverflow.com/questions/45778455/k

Re: Different Schemas on same Kafka Topic

2017-08-17 Thread Svante Karlsson
Well, the purpose of the schema registry is to map a 16 bit id to a avro schema. with or without rules on how you may update a schema with a given name. To decode avro you need a schema. Either you "know" whats in a given topic and then you can hardcode it. Or you prepend it with something. ie the

Re: Limit of simultaneous consumers/clients?

2017-07-31 Thread Svante Karlsson
It feels like the wrong usecase for kafka. Its not meant as something you connect your end users to. Maybe MQTT would be a better fit as the serving layer to end users or just poll as you said. 2017-07-31 17:10 GMT+02:00 Thakrar, Jayesh : > You may want to look at the Kafka REST API instead of ha

Re: Using JMXMP to access Kafka metrics

2017-07-19 Thread Svante Karlsson
I've used jolokia which gets JMX metrics without RMI (actually json over http) https://jolokia.org/ Integrates nicely with telegraf (and influxdb) 2017-07-19 20:47 GMT+02:00 Vijay Prakash < vijay.prak...@microsoft.com.invalid>: > Hey, > > Is there a way to use JMXMP instead of RMI to access Kafk

Re: Issue in Kafka running for few days

2017-04-30 Thread Svante Karlsson
else in the community with more experience can recognize > the symptoms but in the meantime, if you haven't already done so, you > may want to search for similar issues: > https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22ZK%20expired%3B%20shut%20dow

Re: Issue in Kafka running for few days

2017-04-26 Thread Svante Karlsson
You are not supposed to run an even number of zookeepers. Fix that first On Apr 26, 2017 20:59, "Abhit Kalsotra" wrote: > Any pointers please > > > Abhi > > On Wed, Apr 26, 2017 at 11:03 PM, Abhit Kalsotra > wrote: > > > Hi * > > > > My kafka setup > > > > > > **OS: Windows Machine*6 broker

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Svante Karlsson
What kind of disk are you using for the rocksdb store? ie spinning or ssd? 2016-11-25 12:51 GMT+01:00 Damian Guy : > Hi Frank, > > Is this on a restart of the application? > > Thanks, > Damian > > On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > > > Hi y'all, > > > > I have a reasonably simple

Re: kafka connect(copycat) question

2015-12-03 Thread Svante Karlsson
Hi, I tried building this today and the problem seems to remain. /svante [INFO] Building kafka-connect-hdfs 2.0.0-SNAPSHOT [INFO] Downloading: http://packages.confluent.io/maven/io/confluent/kafka-connect-avro-converter/2.

Re: Locality question

2015-11-12 Thread Svante Karlsson
If you have a kafka partition that is replicated to 3 nodes the partition varies (in time) thus making the colocation pointless. You can only produce and consume to/from the leader. /svante 2015-11-12 9:00 GMT+01:00 Young, Ben : > Hi, > > Any thoughts on this? Perhaps Kafka is not the best way

Re: How to correctly handle offsets?

2015-06-01 Thread svante karlsson
1) correlationId is just a number that you get back in your reply. you can safely set it to anything. If you have some kind of call identification is your system that you want to trace through logs - this is what you would use. 2) You can safely use any external offset management you like. just st

Re: Kafka still aware of old zookeeper nodes

2015-04-30 Thread svante karlsson
Have you changed zookeeper.connect= in server.properties. A better procedure for replacing zookeeper nodes would be to shutdown one and install the new one with the same ip. This can easily be done to a running cluster. /svante 2015-04-30 20:08 GMT+02:00 Dillian Murphey : > I had 3 zookeeper

hive output to kafka

2015-04-28 Thread Svante Karlsson
What's the best way of exporting contents (avro encoded) from hive queries to kafka? Kind of camus, the other way around best regards svante

Re: Kafka Consumer

2015-03-31 Thread svante karlsson
Your consumer "might" belong to a consumer group. Just commit offsets to that consumer groups/topic/partition and it will work. That said - if you want to figure out the consumers groups that exists you have to look in zookeeper. There is no kafka API to get or create them. In the java client it i

Re: Producer Behavior When one or more Brokers' Disk is Full.

2015-03-26 Thread svante karlsson
>4. As for recovering broker from disk full, if replication is enabled one >can just bring it down (the leader of the partition will then migrate to >other brokers), clear the disk space, and bring it up again; if replication >is not enabled then you can first move the partitions away from this bro

Re: Broker shutdown, Can't restart

2015-03-21 Thread svante karlsson
>Is there a specific reason for the collocation of all partitions of a topic? Not all partitions - any partition of a topic is kept in a separate dir. (hopefully not all on the same server) >This means, the capacity of required volume is to be determined by the retention size of the topic with l

Re: Broker shutdown, Can't restart

2015-03-21 Thread svante karlsson
The shutdown is expected. All data in a partition is kept in a single directory (=> single disk) I would move some topics/partitions from a full disk to a disk (on the same broker) with more space. If you have very unbalanced topics this might be hard. You could get a bigger disk and copy the da

Re: Kafka 0.8.2 log cleaner

2015-03-02 Thread svante karlsson
Wouldn't it be rather simple to add a retention time on "deleted" items ie keys with null value for topics that are compacted? The retention time would then be set to some "large" time to allow all consumers to understand that a previous k/v is being deleted. 2015-03-02 17:30 GMT+01:00 Ivan Bal

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread svante karlsson
Do you have to separate the snapshot from the "normal" update flow. I've used a compacting kafka topic as the source of truth to a solr database and fed the topic both with real time updates and "snapshots" from a hive job. This worked very well. The nice point is that there is a seamless transiti

Re: kafka.server.ReplicaManager error

2015-02-05 Thread svante karlsson
/a Error > Path:/brokers/topics/mytopic/partitions/143/state Error:KeeperErrorCode = > BadVersion for /brokers/topics/mytopic/partitions/143/state > > It's probably worthwhile to note that we've disabled unclean leader > election. > > > > On Thu, Feb 5, 2015 at 2:01 PM, svant

Re: kafka.server.ReplicaManager error

2015-02-05 Thread svante karlsson
I believe I've had the same problem on the 0.8.2 rc2. We had a idle test cluster with unknown health status and I applied rc3 without checking if everything was ok before. Since that cluster had been doing nothing for a couple of days and the retention time was 48 hours it's reasonable to assume th

Re: kafka sending duplicate content to consumer

2015-01-23 Thread svante karlsson
A kafka broker never pushes data to a consumer. It's the consumer that does a long fetch and it provides the offset to read from. The problem lies in how your consumer handles the for example 1000 messages that it just got. If you handle 500 of them and crash without committing the offsets somewhe

Re: Isr difference between Metadata Response vs /kafka-topics.sh --describe

2015-01-21 Thread svante karlsson
thanks, svante 2015-01-21 16:30 GMT+01:00 Joe Stein : > Sounds like you are bumping into this > https://issues.apache.org/jira/browse/KAFKA-1367 > >

Isr difference between Metadata Response vs /kafka-topics.sh --describe

2015-01-21 Thread svante karlsson
We are running an external (like in non supported) C++ client library agains 0.8.2-rc2 and see differences in the Isr vector in Metadata Response compared to what ./kafka-topics.sh --describe returns. We have a triple replicated topic that is not updated during the test. kafka-topics.sh returns

Re: How to handle broker disk failure

2015-01-21 Thread svante karlsson
sing > data over automatically. > > Thanks, > > Jun > > On Tue, Jan 20, 2015 at 1:02 AM, svante karlsson wrote: > > > I'm trying to figure out the best way to handle a disk failure in a live > > environment. > > > > The obvious (and naive) solution i

typo in wiki

2015-01-20 Thread svante karlsson
In the wiki - there is a statement that a partition must fit on a single machine, while technically true, isn't it so that a partition must fit on a single disk on that machine. https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowmanytopicscanIhave ? >A partition is basically a director

How to handle broker disk failure

2015-01-20 Thread svante karlsson
I'm trying to figure out the best way to handle a disk failure in a live environment. The obvious (and naive) solution is to decommission the broker and let other brokers taker over and create new followers. Then replace the disk and clean the remaining log directories and add the broker again. T

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
wer" also looks strange > > > > I can't file it as a bug report as I can't reproduce it but I have a > > distinct feeling that I can't trust the new mbeans or have to find > another > > explanation. > > > > regard it as an observation

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
bug report as I can't reproduce it but I have a distinct feeling that I can't trust the new mbeans or have to find another explanation. regard it as an observation if someone else reports issues. thanks, svante 2015-01-16 20:56 GMT+01:00 svante karlsson : > Jun, > > I don

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
startup. Can you reproduce the issue reliably? Also, is what you saw an > issue with the mbean itself or graphite? > > Thanks, > > Jun > > On Fri, Jan 16, 2015 at 4:38 AM, svante karlsson wrote: > > > I upgrade two small test cluster and I had two small issues but

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-16 Thread svante karlsson
I upgrade two small test cluster and I had two small issues but I'm, not clear yet as to if those were an issue due to us using ansible to configure and deploy the cluster. The first issue could be us doing something bad when distributing the update (I updated, not reinstalled) but it should be ea

Re: how to order message between different partition

2015-01-08 Thread svante karlsson
The messages are ordered per partition. No order between partitions. If you really need ordering use one partition. 2015-01-08 9:44 GMT+01:00 YuanJia Li : > Hi all, > I have a topic with 3 partitions, and each partion has its sequency in > kafka. > How to order message between different partio

Re: mirrormaker tool in 0.82beta

2015-01-07 Thread svante karlsson
No, I missed that. thanks, svante 2015-01-07 6:44 GMT+01:00 Jun Rao : > Did you set offsets.storage to kafka in the consumer of mirror maker? > > Thanks, > > Jun > > On Mon, Jan 5, 2015 at 3:49 PM, svante karlsson wrote: > > > I'm using 0.82beta a

mirrormaker tool in 0.82beta

2015-01-05 Thread svante karlsson
I'm using 0.82beta and I'm trying to push data with the mirrormaker tool from several remote sites to two datacenters. I'm testing this from a node containing zk, broker and mirrormaker and the data is pushed to a "normal" cluster. 3 zk and 4 brokers with replication. While the configuration seems

Re: Increase in Kafka replication fetcher thread not reducing log replication

2014-12-22 Thread svante karlsson
What kind of network do you have? gigabit? if so 90 MB/s would make sense Also since you have one partition what's your raw transfer speed to the disk? 90 MB/s makes sense here as well... If I were looking for rapid replica catch up I'd have at least 2x Gbit and partitioned topics spread out o

Re: How do I create a consumer group

2014-12-16 Thread svante karlsson
>Yes - see the offsets.topic.num.partitions and >offsets.topic.replication.factor broker configs. Joel, that exactly what I was looking for. I'll look into that and the source for OffsetsMessageFormatter later today! thanks svante >

Re: How do I create a consumer group

2014-12-15 Thread svante karlsson
Thanks, > > Jun > > On Fri, Dec 12, 2014 at 2:45 AM, svante karlsson wrote: > > > Disregard the creation question - we must have done something wrong > because > > now our code is working without obvious changes (on another set of > > brokers). > > > &

Re: How do I create a consumer group

2014-12-12 Thread svante karlsson
If I understand KAFKA-1476 it is only a command line tool that gives access by using ZKUtils not an API to Kafka. We're looking for a Kafka API so I guess that this functionality is missing. thanks for the pointer Svante Karlsson 2014-12-12 19:03 GMT+01:00 Jiangjie Qin : > > KA

Re: How do I create a consumer group

2014-12-12 Thread svante karlsson
/stable in any way or is there a better way of listing the existing group names? svante 2014-12-11 20:59 GMT+01:00 svante karlsson : > > We're using 0.82 beta and a homegrown c++ async library based on boost > asio that has support for the offset api. > (apike

How do I create a consumer group

2014-12-11 Thread svante karlsson
We're using 0.82 beta and a homegrown c++ async library based on boost asio that has support for the offset api. (apikeys OffsetCommitRequest = 8, OffsetFetchRequest = 9, ConsumerMetadataRequest = 10) If we use a java client and commit an offset then the consumer group shows up in the response f

Re: Producer connection unsucessfull

2014-12-05 Thread svante karlsson
I haven't run the sandbox but check if the kafka server is started at all. ps -ef | grep kafka 2014-12-05 14:34 GMT+01:00 Marco : > Hi, > > I've installed the Hortonworks Sandbox and try to get into Kafka. > > Unfortunately, even the simple tutorial does not work :( > http://kafka.apache.org/d

Re: KafkaException: Should not set log end offset on partition

2014-12-04 Thread svante karlsson
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper > ) > and make sure the 3 registered hosts are unique? > > Thanks, > > Jun > > On Wed, Dec 3, 2014 at 5:54 AM, svante karlsson wrote: > > > I've installed (for ansible scriptin

Re: KafkaException: Should not set log end offset on partition

2014-12-03 Thread svante karlsson
I found some logs like this before everything started to go wrong ... [2014-12-02 07:08:11,722] WARN Partition [test3,13] on broker 2: No checkpointed highwatermark is found for partition [test3,7] (kafka.cluster.Partition) [2014-12-02 07:08:11,722] WARN Partition [test3,7] on broker 2: No checkpo

KafkaException: Should not set log end offset on partition

2014-12-03 Thread svante karlsson
I've installed (for ansible scripting testing purposes) 3 VM's each containing kafka & zookeeer clustered together Ubuntu 14.04 Zookeepers are 3.4.6 and kafka 2.11-0.8.2-beta The kafka servers have broker id's 2, 4, 6 The zookeepers seems happy. The kafka servers start up and seems happy. I can

Re: Partition key not working properly

2014-11-25 Thread svante karlsson
By default, the partition key is used for hashing then it's placed in a partition that has the appropriate hashed keyspace. If you have three physical partitions and then give the partition key "5" it has nothing to do with physical partition 5 (that does not exist) , similar to physical: partitio

Re: Using Kafka for ETL from DW to Hadoop

2014-10-23 Thread svante karlsson
Both variants will work well (if your kafka cluster can handle the full volume of the transmitted data for the duration of the ttl on each topic) . I would run the whole thing through kafka since you will be "stresstesting" you production flow - consider if you at some later time lost your destina

Re: C/C++ kafka client API's

2014-10-14 Thread svante karlsson
Magnus, Do you have any plans to update the protocol to 0.9? I built a boost asio based version half a year ago but that did only implement v0.8 and I have not found time to upgrade it. It is a quite big job to have something equal to java high and low level API. /svante > >

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
er Message Broker.* > 1. We have to handle 30,000 TPS. > 2. We need to prioritize the requests. > 3. Request Data should not be lost. > > > Thanks > > Regards > Lavish Goel > > > > On Mon, Sep 22, 2014 at 4:20 PM, svante karlsson wrote: > > &

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
at case should we move to some other message broker? If > yes, Can you please tell me the name which is best for this use case and > can handle large amount of requests? > Is there any workaround in Kafka? If Yes, Please tell me. > > Thanks > > Warm Regards > Lavish Goel

Re: How to use RPC mechanism in Kafka?

2014-09-22 Thread svante karlsson
Wrong use-case. Kafka is a queue (in normal case a TTL (time to live) on messages). There is no correlation between producers and consumers. There is no concept of a consumed message. There is no "request" and no "response". You can produce messages (in another topic) as result of your processing

Re: 答复: kafka performance question

2014-05-26 Thread svante karlsson
Do you read from the file in the callback from kafka? I just implemented c++ bindings and in one of the tests i did I got the following results: 1000 messages per batch (fairly small messages ~150 bytes) and then wait for the network layer to ack the send (not server ack)'s before putting another

Re: Adding partitions...

2014-05-23 Thread svante karlsson
No reshuffeling will take place. And reading messages and put them back in again will not remove the messages from their "old" partition so the same message will the exist in more than one partition - eventually to get aged out of the oldest partion. If you use partitioning to distribute the load