Hey Mich, Thanks for this introduction on your forthcoming proposal "Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics". I recently came across an article by Databricks with title Scalable Spark Structured Streaming for REST API Destinations. Their use case is similar to your suggestion but what they are saying is that they have incoming stream of data from sources like Kafka, AWS Kinesis, or Azure Event Hub. In other words, a continuous flow of data where messages are sent to a REST API as soon as they are available in the streaming source. Their approach is practical but wanted to get your thoughts on their article with a better understanding on your proposal and differences. Thanks
On Tuesday, 9 January 2024 at 00:24:19 GMT, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: Please also note that Flask, by default, is a single-threaded web framework. While it is suitable for development and small-scale applications, it may not handle concurrent requests efficiently in a production environment.In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI ( Web Server Gateway Interface) that is commonly used to serve Flask applications in production. It provides multiple worker processes, each capable of handling a single request at a time. This makes Gunicorn suitable for handling multiple simultaneous requests and improves the concurrency and performance of your Flask application. HTH Mich Talebzadeh,Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destructionof data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction. On Mon, 8 Jan 2024 at 19:30, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: Thought it might be useful to share my idea with fellow forum members. During the breaks, I worked on the seamless integration of Spark Structured Streaming with Flask REST API for real-time data ingestion and analytics. The use case revolves around a scenario where data is generated through REST API requests in real time. The Flask REST API efficiently captures and processes this data, saving it to a Spark Structured Streaming DataFrame. Subsequently, the processed data could be channelled into any sink of your choice including Kafka pipeline, showing a robust end-to-end solution for dynamic and responsive data streaming. I will delve into the architecture, implementation, and benefits of this combination, enabling one to build an agile and efficient real-time data application. I will put the code in GitHub for everyone's benefit. Hopefully your comments will help me to improve it. Cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destructionof data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.The author will in no case be liable for any monetary damages arising from suchloss, damage or destruction.