Dear Israel,

Thank you so much for your support, I will check the links you sent in your 
email to start my service. 

As for your question, yes the events generated by the devices are similar in 
data structures. I would also like to state that my service will be either done 
in java or C#. Would using C# be an issue? Also is there some link you 
recommend I can check before writing my code.

I have also one more question, in your mail you mentioned using one topic with 
many partitions, I would like to state that the number of devices I'm using is 
dynamic, are you suggesting I create a partition for each device and would it 
be possible if I don't know the exact number of devices I have, or should I 
create multiple partition for the purpose of multi-processing only?

Thank you,

Best Regards
Ola Bissani
Developer Manager
Easysoft
Mobile Lebanon   : +961       3 61 16 90
Office Lebanon      :+961       1 33 55 15/17
E mail:     ola.biss...@easysoft.com.lb   
web site:www.easysoft.com.lb
"Tailored to Perfection"                                                        
                         
   
The information transmitted is intended only for the person or entity to which 
it is addressed and it may contain proprietary, business-confidential, and/or 
legally privileged information. If you are not the intended recipient of this 
email you are hereby notified that any use, review, retransmission, 
dissemination, distribution, reproduction or any other action taken in reliance 
upon this email is strictly prohibited. If you have received this email in 
error, please contact the sender and delete this email and its contents from 
any computer. Any views expressed in this email are those of the individual 
sender and may not necessarily reflect the views of the company.                
                                                                                
                                     Please consider the environmet before 
printing this email.

-----Original Message-----
From: Israel Ekpo <israele...@gmail.com> 
Sent: Thursday, December 30, 2021 3:47 PM
To: Users <users@kafka.apache.org>
Subject: Re: Kafka-Real Time Update

Ola,

Let's review the Apache Kafka ecosystem briefly, and then I will make an 
attempt to address your concerns:

In the Kafka Ecosystem, we have the following components:

- Brokers (stores events in logical containers called Topics. Topics are 
analogous to Tables in relational databases like MySQL or PostgreSQL)
- Producers (the generate events and sends them to the brokers for storage
- Consumers (picks up the events from the Topics and processes or consumes
them)
- Streams (at a high level combines Consumer and Producer mechanism to process 
events in near real time and send them back to the Topics)
- Schema Registry (keeps track of data structures in the topics. Can be used 
for Avro, JSON, Protobuf formats)

https://kafka.apache.org/documentation/#api

https://github.com/confluentinc/schema-registry

There are two main things to consider here in your scenario.

Each of the devices is a prospective Producer of events that will be sent to 
the topic.

You don't necessarily need to dedicate topics uniquely for each producer just 
like how you will not need to create a table for each customer record that you 
need to store.
Events sent to a topic are generally grouped together because they have similar 
data structure, so if your devices are generating messages with the same data 
structure, then regardless of the number of devices, you should still be able 
to send them to the same topic. Just make sure that you have enough partitions 
and you should be able to consume them in parallel. The partition count is 
important because the maximum number of consumers within a group of Consumers 
is limited by default by the number of partitions in the topic. If you are 
looking to have up to let's say 50 parallel processors in your Consumer Group 
then you need to specify 50 partitions when creating the topic

Nevertheless, with the parallel consumer you can mitigate this partition 
limitation by using the parallel consumer by Confluent to process your events 
with key-based ordering.

https://github.com/confluentinc/parallel-consumer

Key-Based ordering essentially eliminates this limitation 
https://github.com/confluentinc/parallel-consumer#ordered-by-key

The second item of consideration is that you wanted to "loop" to process the 
events. I don't think you need to do this. You can consider the Streams API, to 
process your events as they arrive without needing to do this

https://kafka.apache.org/30/documentation/streams/

The Streams API has so many built-in mechanisms that allow you to just focus on 
how to process, join and aggregate your events as they arrive at the topics 
without the need to loop

I definitely would not recommend having a topic (table) for each device.
Find a way to group the data structures that are similar into a particular 
topic, then you can use the Consumer API or Streams API to process the events 
in near-real time.

If you are not really comfortable with writing Java Code for the stream 
processing, you can also take a look at KSQLDB that allows you to leverage 
SQL-like syntax to process streams arriving in Kafka Brokers

https://ksqldb.io/

These systems are capable of handling a significantly large amount of events 
per second at scale so I have no doubt that you will be able to figure out how 
to implement the architecture to resolve your needs.

When you have a moment, could you confirm if your events generated by the 
devices are similar in data structures?

I hope this message gives you enough information to get started.

Sincerely,

Israel Ekpo
Lead Instructor, IzzyAcademy.com
https://izzyacademy.com/
https://www.youtube.com/c/izzyacademy
<https://www.youtube.com/c/izzyacademy>


On Thu, Dec 30, 2021 at 5:13 AM Ola Bissani <ola.biss...@easysoft.com.lb>
wrote:

> Dears,
>
>
>
> I'm looking for a way to get real-time updates using my service, I 
> believe kafka is the way to go but I still have an issue on how to use it.
>
>
>
> My system gets data from devices using GPRS, I then read this data and 
> analyze it to check what action I should do afterwards. I need the 
> analyzing step to be as fast as possible. I was thinking of two options:
>
>
>
> The first option is to gather all the data sent from all the devices 
> into one huge topic and then getting all the data from this topic and 
> analyzing it. The downside of this option is that the data analysis 
> step is delaying my work since I was to loop through the topic data, 
> on the other hand the advantage is that I have a manageable number of topics 
> ( only 1 topic).
>
>
>
> The other option is to divide the data I'm gathering into several 
> small topics by allowing each device to have its own topic, take into 
> consideration that the number of devices is large, I'm talking about 
> more that 5000 devices. The downside of this option is that I have 
> thousands of topics, where the advantage is that each topic will have 
> a manageable amount of data allowing me to get my analysis done in 
> much more reasonable time.
>
>
>
> Can you advise on what option is better and whether there is a third 
> option that I'm not considering,
>
> *Best Regards*
>
> *Ola Bissani*
>
> Developer Manager
>
> *Easysoft*
>
> Mobile Lebanon   : +961       3 61 16 90
>
> Office Lebanon      :+961       1 33 55 15/17
>
> E mail:     ola.biss...@easysoft.com.lb
>
> web site:www.easysoft.com.lb
>
> *"Tailored to Perfection"*
>
>
>   [image: image1] [image: most innov 2017 final logo][image: Description:
> Description: easysoft-logo transparent2012]
>
> The information transmitted is intended only for the person or entity 
> to which it is addressed and it may contain proprietary, 
> business-confidential, and/or legally privileged information. If you 
> are not the intended recipient of this email you are hereby notified 
> that any use, review, retransmission, dissemination, distribution, 
> reproduction or any other action taken in reliance upon this email is 
> strictly prohibited.
> If you have received this email in error, please contact the sender 
> and delete this email and its contents from any computer. Any views 
> expressed in this email are those of the individual sender and may not 
> necessarily reflect the views of the company.
> Please consider the environmet before printing this email.
>
>
>

Reply via email to