Re: Trigger topic compaction before uploading to S3

2020-09-22 Thread Ricardo Ferreira
These properties can't be triggered programatically. Kafka uses an internal thread pool called "Log Cleaner Thread" that does the job asynchronously of deleting old segments ("delete") and deleting repeated records ("compact"). Whatever the S3 connector picks up is already compacted and/or

Re: Kafka - producer writing to a certain broker...

2020-07-27 Thread Ricardo Ferreira
I am not a huge fan of criticizing without making myself useful first; so here is what you can do in order to have a producer writing records to a specific broker: 1. Create a topic with the # of partitions equal to the # of brokers. By default Kafka will try to evenly distributed the

Re: Enabling Unclean leader election

2020-07-22 Thread Ricardo Ferreira
You should be able to use the command `kafka-leader-election` to accomplish this. This command has an option called "--election-type" that you can use to specify whether the election is preferred or unclean. -- Ricardo On 7/22/20 11:31 AM, nitin agarwal wrote: Hi, Is there a way to enable

Re: Priority queues using kafka topics and a consumer group?

2020-07-21 Thread Ricardo Ferreira
Richard, There is; but it doesn't look like what you would do in a typical messaging technology. As you may know Apache Kafka® is not messaging but a distributed streaming platform based on the concept of commit log. The main characteristic of a commit log is that they keep records in the

Re: Problem while sending data

2020-07-20 Thread Ricardo Ferreira
Here is the problem: 20:02:14.451 [kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.NetworkClient - [Producer clientId=producer-1] Error while fetching metadata with correlation id 3 : {t-ord3=LEADER_NOT_AVAILABLE} 20:02:14.451 [kafka-producer-network-thread |

Re: Is it possible to succeed message delivery in case TimeoutException thrown (by delivery.timeout.ms)

2020-07-20 Thread Ricardo Ferreira
This possibility highly depends of the option used for the `ack` property. If you set to `0` then there is high probability. If you set to `1` then probability decreases, and so forth so on. But in general, you can minimize the possibility of this to happen using the following timing

Re: Kafka SFTP connector

2020-07-15 Thread Ricardo Ferreira
, 2020, 02:06 Ricardo Ferreira <mailto:rifer...@riferrei.com>> wrote: Vishnu, A public key file can be specified via the property `tls.public.key`. Thanks, -- Ricardo On 7/14/20 6:09 AM, vishnu murali wrote: Hi all, I am using SFTP connector which that SFTP c

Re: Kafka SFTP connector

2020-07-14 Thread Ricardo Ferreira
Vishnu, A public key file can be specified via the property `tls.public.key`. Thanks, -- Ricardo On 7/14/20 6:09 AM, vishnu murali wrote: Hi all, I am using SFTP connector which that SFTP connection can be accessed by using public key file. How can I give this configuration in postman to

Re: Consumer Groups Describe is not working

2020-07-08 Thread Ricardo Ferreira
Ann, You can try execute the CLI `kafka-consumer-groups` with TRACE enabled to dig a little deeper in the problem. In order to do this you need to: 1. Make a copy of your `$KAFKA_HOME/etc/kafka/tools-log4j.properties` file 2. Set `root.logger=TRACE,console` 3. Run `export

Re: AW: Problem with replication?!

2020-07-08 Thread Ricardo Ferreira
:27,246] DEBUG Got ping response for sessionid: 0x13f85c80023 after 1ms (org.apache.zookeeper.ClientCnxn) [2020-07-08 13:13:33,251] DEBUG Got ping response for sessionid: 0x13f85c80023 after 1ms (org.apache.zookeeper.ClientCnxn) Kind regards, Sebastian *Von:*Ricardo Ferreira *Gesendet

Re: Problem with replication?!

2020-07-07 Thread Ricardo Ferreira
Given the stack trace you've shared below I can tell that this *is not a replication issue* but rather -- your producer is not being able to write records into the partitions because the brokers that host them are unavailable. Now, I know that they are indeed running so "unavailable" here

Re: destination topics in mm2 larger than source topic

2020-07-07 Thread Ricardo Ferreira
but it looks the same - and in any case I expect idempotency to maybe decrease the topic size, not increase it... Thanks, Iftach On Thu, Jul 2, 2020 at 5:30 PM Ricardo Ferreira mailto:rifer...@riferrei.com>> wrote: Ifta

Re: Keys and partitions

2020-07-07 Thread Ricardo Ferreira
It is also important to note that since the release 2.4 of Apache Kafka the DefaultPartitioner now implements a sticky partitioning strategy rather than round-robin based on the key. This means that if you need fine control over which partition records will end up given the key -- you ought to

Re: destination topics in mm2 larger than source topic

2020-07-02 Thread Ricardo Ferreira
Iftach, I think you should try observe if this happens with other topics. Maybe something unrelated might have happened already in the case of the topic that currently has ~3TB of data -- making things even harder to troubleshoot. I would recommend creating a new topic with few partitions

Re: Problem in reading From JDBC SOURCE

2020-07-02 Thread Ricardo Ferreira
Vishnu, I think is hard to troubleshoot things without the proper context. In your case, could you please share an example of the rows contained in the table `sample`? As well as its DDL? -- Ricardo On 7/2/20 9:29 AM, vishnu murali wrote: I go through that documentation Where it described

Re: Is SinkContext thread safe?

2020-06-26 Thread Ricardo Ferreira
Ben, My understanding is that you don't need to worry about any thread synchronization. Each task has their own instance of the `SinkTaskContext` and given Kafka Connect's behavior of spreading the tasks over the cluster -- by definition it won't be the same instance. This means that even if

Re: monitoring topic subscriptions etc

2020-06-23 Thread Ricardo Ferreira
Joris, I think the best strategy here depends on how fast you want to get access to the user events. If latency is a thing then just read de data from the topic along with the other applications. Kafka follows the /write-once-read-many-times/ pattern which encourage developers to reuse the

Re: Memory for a broker

2020-06-20 Thread Ricardo Ferreira
Sunil, This has to do with Kafka's behavior of being persistent and using the broker's filesystem as the storage mechanism for the commit log. In modern operating systems a watermark of *85%* of the available RAM is dedicated to page cache and therefore, with Kafka running in a machine with

Re: Uneven distribution of messages in topic's partitions

2020-06-20 Thread Ricardo Ferreira
the latency. Thanks, On Fri, Jun 19, 2020 at 11:36 PM Ricardo Ferreira wrote: Hi Hemant, Being able to lookup specific records by key is not possible in Kafka. As a distributed streaming platform based on the concept of a commit log Kafka organizes data sequentially where each record has

Re: Uneven distribution of messages in topic's partitions

2020-06-19 Thread Ricardo Ferreira
this requirement? 2. Or do I need to use KSql DB for meeting this requirement? I did some research around it but I don't want to run separate KSql DB server. 3. Any other suggestions? Regards, On Thu, 18 Jun 2020, 6:51 pm Ricardo Ferreira, <mailto:rifer...@riferrei.com>> wrote:

Re: Duplicate records on consumer side.

2020-06-19 Thread Ricardo Ferreira
ow I dont know what is default value in cluster for above 2 parameters and what value should I set in logstash kafka input? Sorry to mixup so many things in one mail Regards, Sunil. On Fri, 19 Jun 2020 at 7:59 PM, Ricardo Ferreira mailto:rifer...@riferrei.com>> wrote: Sunil,

Re: Broker thread pool sizing

2020-06-19 Thread Ricardo Ferreira
Gérald, Typically you should set the `num.io.threads` to something greater than the # of disks since data hits the page cache and the disk. Using the default of 8 when you have a JBOD of 12 attached volumes would cause an increase of CPU context switching, for example. `num.network.threads`

Re: Duplicate records on consumer side.

2020-06-19 Thread Ricardo Ferreira
Sunil, Kafka ensures that each partition is read by one given thread only from a consumer group. Since your topic has three partitions, the rationale is that at least three threads from the consumer group will be properly served. However, though your calculation is correct (3 instances,

Re: Frequent consumer offset commit failures

2020-06-19 Thread Ricardo Ferreira
James, If I were you I would start investigating what is causing this network drops between your cluster and your consumers. The following messages are some indications of this: * "Offset commit failed on partition MyTopic-53 at offset 957: The request *timed out*." * "Caused by:

Re: kafka consumer thread crashes and doesn't consume any events without service restart

2020-06-18 Thread Ricardo Ferreira
n has occurred during Consumer.poll itself, so there is no ConsumerRecord for the application to process and hence the application doesn't know the offset of record where the poll has failed. On Thu, Jun 18, 2020 at 7:03 PM Ricardo Ferreira mailto:rifer...@riferrei.com>> wrote: Pus

Re: kafka consumer thread crashes and doesn't consume any events without service restart

2020-06-18 Thread Ricardo Ferreira
Pushkar, Kafka uses the concept of offsets to identify the order of each record within the log. But this concept is more powerful than it looks like. Committed offsets are also used to keep track of which records has been successfully read and which ones are not. When you commit a offset in

Re: Uneven distribution of messages in topic's partitions

2020-06-18 Thread Ricardo Ferreira
Hemant, This behavior might be the result of the version of AK (Apache Kafka) that you are using. Before AK 2.4 the default behavior for the DefaultPartitioner was to load balance data production across the partitions as you described. But it was found that this behavior would cause

Re: kafka log compaction

2020-06-18 Thread Ricardo Ferreira
Pushkar, "1. Would setting the cleanup policy to compact (and No delete) would always retain the latest value for a key?" -- Yes. This is the purpose of this setting. "2. Does parameters like segment.bytes, retention.ms also play any role in compaction?" -- They don't play any role in

Re: NPE in kafka-streams with Avro input

2020-06-17 Thread Ricardo Ferreira
Hi Dumitru, According to the stack trace that you've shared the NPE is being thrown by this framework called *Avro4S* that you're using. This is important to isolate the problem because it means that it is not Kafka Streams the problem but rather, your serialization framework. Nevertheless,

Re: Kafka partitions replication issue

2020-06-17 Thread Ricardo Ferreira
Karnam, I think the combination of the setting preferred leader and auto leader rebalance enable along with the hardware issue in broker-3 might be giving you the opposite effect that you are expecting. If the broker-3 happens to be the preferred leader for a given partition (because it

Re: kafka log retention problem

2020-06-15 Thread Ricardo Ferreira
It sounds like you are trying to forcibly delete the files that build up the segments used by the partitions. If that is the case then I would recommend not using external tools and leave to Kafka manage its filesystem. If you set the retention policy (either by size or time) in your topics

Re: request-reply model

2020-06-12 Thread Ricardo Ferreira
Roman, If you are implementing this in Java and use the Spring Framework then there is a good support in there that abstracts away much of the complexity related to correlate requests and responses. Here is an article that gives an example of how to implement this:

Re: handle multiple requests

2020-06-10 Thread Ricardo Ferreira
Hi there, Unless you are dealing with a low volume scenario, you should avoid tie each message/record to a specific thread. It will limit your ability to scale the processing out as CPU is a scarce resource. Alternatively, you should write your code to fetch multiple records at once (like a

Re: Unused options in ConsumerPerformance

2020-06-09 Thread Ricardo Ferreira
Hi Jiamei, Changes in Apache Kafka need to be handled via JIRA tickets . The best way to get started with this is joining the developer mailing list . Thanks, -- Ricardo On 6/9/20 4:00 AM, Jiamei Xie wrote: Hi It seems that Option

Re: count total & percentage in ksqldb

2020-06-05 Thread Ricardo Ferreira
| ++---++ |cart| 3 | 0.25 | ++---++ | purchase | 3 | 0.25 | ++---++ Percentage or total is the same thing for me :) Thank you On Fri, Jun 5, 2020 at 3:13 PM Ricardo Ferreira <mailto:rifer...@riferrei.com>> wrote:

Re: count total & percentage in ksqldb

2020-06-05 Thread Ricardo Ferreira
Mohammed, The first thing you need to do is making sure to set a key for this stream. This can be accomplished either in the creation statement or creating a new stream and using the *PARTITION BY* clause. For the sake of simplicity; the example below uses the creation statement strategy:

Re: How to resolve Kafka within docker environment?

2020-06-04 Thread Ricardo Ferreira
Anto, I am not 100% familiar with this image `confluentinc/cp-kafka` but there is a few things that you should try: 1) Make sure your `kafka` containers has this name set appropriately ```   kafka:     image: confluentinc/cp-enterprise-kafka:5.5.0     container_name: *kafka*     depends_on:

Apache Kafka 0.8.2 Consumer Example

2015-02-08 Thread Ricardo Ferreira
, only my Java program that not. Did I miss something? I heard that the API changed, so I'd like to know if someone can share a simple client with me. Please, respond directly to me or just reply all because I am not currently subscribed to the group. Thanks, Ricardo Ferreira

Re: Apache Kafka 0.8.2 Consumer Example

2015-02-08 Thread Ricardo Ferreira
of your application. Can you use the offset checker tool on that? Gwen On Sun, Feb 8, 2015 at 9:01 AM, Ricardo Ferreira jricardoferre...@gmail.com wrote: Hi Gwen, Thanks for the response. In my case, I have both consumer application and the server versions in 0.8.2, Scala 2.10

Proper Relationship Between Partition and Threads

2015-01-28 Thread Ricardo Ferreira
consumer applications with 4 threads each? It would break any load-balancing made by Kafka brokers? Anyway, I'd like to understand if the proper number of threads that should match the number of partitions is per application or if there is some other best practice. Thanks in advance, Ricardo

Re: Proper Relationship Between Partition and Threads

2015-01-28 Thread Ricardo Ferreira
Thank you very much Christian. That's what I concluded too, I wanted just to double check. Best regards, Ricardo Ferreira On Wed, Jan 28, 2015 at 4:44 PM, Christian Csar christ...@csar.us wrote: Ricardo, The parallelism of each logical consumer (consumer group) is the number