Re: Collocated storage of data that is distributed by Apache Kafka

Mikhail Pochatkin Mon, 13 Feb 2023 06:14:09 -0800

Hi, Mikhail.

Thanks for your questions! First of all I found some links in documentation
which may help you.
Apache Kafka Streamer | Ignite Documentation
<https://ignite.apache.org/docs/latest/extensions-and-integrations/streaming/kafka-streamer>
Apache Cassandra Acceleration With Apache Ignite | Ignite Documentation
<https://ignite.apache.org/docs/latest/extensions-and-integrations/cassandra/overview>
Ignite for Spark | Ignite Documentation (apache.org)
<https://ignite.apache.org/docs/latest/extensions-and-integrations/ignite-for-spark/overview>


If try to dive into your questions, then
1. You may try to override the Ignite AffinityFunction to delegate to
Kafka's partitioning algorithm. It should work.
2. In my understanding, without a cache on the side of the balancer, this
will be problematic to achieve. Otherwise, without this information, the
balancer will at least have to make REST calls to the nodes for checking
the presence of such data. Maybe it will be helpful link Near Caches |
Ignite Documentation (apache.org)
<https://ignite.apache.org/docs/latest/configuring-caches/near-cache>

пн, 13 февр. 2023 г. в 15:53, Mikhail via user <user@ignite.apache.org>:

> Hi guys,
>
>                    I’d like to implement high-load linear scalable
> IoT application using Apache Ignite. Sensors’ data comes from Apache Kafka
> topic which is partitioned by sensor_id. As a primary storage the data is
> saved to Cassandra. Application should allow to do the following:
>
>    - Streaming calculations and saving results to Cassandra
>    - Rapidly return the latest value for each sensor
>    - Getting the values of sensors that are changed during some time
>    period
>    - Creation of data marts based on analytic queries (maybe via Spark
>    queries over Cassandra)
>
> Data stream comes from Kafka and therefore it is automatically linearly
> distributed between consumers in consumer group.
>
> Here I have two questions:
>
>    1. Is there a way to process and store the incoming (from Kafka) data
>    on the Ignite node, that receives the data? In other words, can I somehow
>    synchronize the partitioning of Kafka (which is done by message id) and
>    partitioning of Ignite (which is done by affinity functions), so that other
>    nodes don’t bother about sensors data that are being processed the
>    Ignite node?
>    2. As I need to rapidly return the latest value for some sensor via
>    REST call, what is the appropriate way of implementation of the HTTP load
>    balancer, so that the REST call is automatically balanced to the node,
>    which contains the data for the sensor?
>
>
> --
> Best Regards,
> Mikhail
>


-- 
best regards,
Pochatkin Mikhail.

Re: Collocated storage of data that is distributed by Apache Kafka

Reply via email to