Hi people,

I would like to share some of my experience in data processing using
stateful structured streaming in Apache Spark. Especially in the case when
there are problems related to OutOfMemory errors because the built-in state
store provider tries to keep all of the data in memory. So, I've written
the custom state store provider based on RocksDB key-value storage and
anybody can use it in your own projects. Well, you're welcome, here is the
link to the repository: https://github.com/chermenin/spark-states. If you
have any thoughts how to improve it I'll be glad to see your contributions
into this sub-project!

Thanks for attention :)

Regards,
Alex

Reply via email to