ETL real-time features using Flink with application-level metrics

Dong Lin Sun, 13 Aug 2023 18:03:23 -0700

Hi all,

I am writing this email to promote our open-source feature store project (
FeatHub <https://github.com/alibaba/feathub>) that supports using Flink
(production-ready) and Spark (not production-ready) to compute real-time /
offline features with pythonic declarative feature specifications.


To my best knowledge, this is the most mature open-source project that
supports using Flink as the compute engine. And it is also the only project
that supports multiple compute engines (e.g. Flink, Spark) wth
engine-agonistic feature definition SDK so that you can choose the best
compute engine that meets your needs (e.g. throughput vs. latency), without
changing your programming code, achieving a similar design goal as Apache
Beam.

As another killer feature, we recently supported application-level metrics
so that you can define metrics (e.g. ratio of values that is null in the
last 10 minutes) together with your features and FeatHub can automatically
compile/compute/export these metrics to Prometheus.

Please feel free to learn more about FeatHub by reading its GitHub main
README and doc (https://github.com/alibaba/feathub/tree/master/docs/content).
We have also provided multiple demos at
https://github.com/flink-extended/feathub-examples so that you can try out
FeatHub using docker-compose easily.

Cheers,
Dong

ETL real-time features using Flink with application-level metrics

Reply via email to