Hello All,

I  want to do some data quality analysis on stream data example.

1. Fill rate in a particular column
2. How many events are going to error queue due to favor schema
validation failed?
3. Different statistics measure of a column.
3. Alert if a particular threshold is breached (like if fill rate is less
than 90% for a column)

Is there any library that exists on top of Flink for data quality. As I am
looking there is a library on top of the spark
https://github.com/awslabs/deequ

This proved all that I am looking for.

-- 
Thanks & Regards,
Anuj Jain



<http://www.cse.iitm.ac.in/%7Eanujjain/>

Reply via email to