Hi Anastasia, My take is that in its current form, Spark Connect is not suitable for running long-lived Structured Streaming queries in Standalone mode, especially with long trigger intervals. The lack of support for detached streaming queries makes it problematic for this particular use case. To make Structured Streaming work in Standalone mode, you could:
1. Use spark-submit in cluster mode instead of Spark Connect. 2. Consider alternative cluster managers like YARN or k8s for better driver management. Specific answers q) Is Spark Connect intended to support “detached” Streaming Queries? No, currently Spark Connect ties queries to the client session. Streaming queries stop when the session ends. Detached queries are not yet supported, q) Could Streaming Queries be detached from the session, as they are continuous? This is a valid request. Detaching streaming queries would allow them to run independently, ensuring long-running jobs don’t stop when the session ends. This would require changes in Spark’s session management. q) Would you extend control options in Spark Connect UI (start, stop, reset checkpoints)? Yes, adding controls to start, stop, or reset streaming queries would improve usability, especially for production systems. This feature would give users more dynamic management of long-running streaming jobs. Have a look at this article of mine Building an Event-Driven Real-Time Data Processor with Spark Structured Streaming and API Integration <https://www.linkedin.com/pulse/building-event-driven-real-time-data-processor-spark-mich-zy3ef/?trackingId=RIwY%2FePi0jslLiXqOP8mxQ%3D%3D> HTH, Mich Talebzadeh Architect | Data Engineer | Data Science | Financial Crime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Mon, 23 Sept 2024 at 17:41, Anastasiia Sokhova <anastasiia.sokh...@honic.eu.invalid> wrote: > Dear Spark Team, > > > > I am working with a standalone cluster, and I am using Spark Connect to > submit my applications. > > My current version is 3.5.1. > > > > I am trying to run Structured Streaming Queries with relatively long > trigger intervals (2 hours, 1 day). > > The first issue I encountered was “Streaming query has been idle and > waiting for new data more than 10000ms”. I solved it by increasing the > value in the internal config property > ‘spark.sql.streaming.noDataProgressEventInterval’. > > Now my query is not considered idle anymore but Connect expires the > session after ~1 hour, and the query is killed with it. > > > > I believe, I have studied everything I could find online, but I could not > find the answers. > > I would really appreciate if you provided some 😊 > > > > Is it not intended for Spark Connect to support “detached” Streaming > Queries? > > Would you consider detaching StreamingQueries from the sessions that start > them, as they are meant to run continuously? > > Would you consider extending control options in Spark Connect UI (start, > stop, reset checkpoints)? > > It will help the users like me, who want to use Spark’s Structured > Streaming and Connect without running additional applications just to keep > the session alive. > > > > I will be happy to answer any question from your side or provide more > details. > > > > Best regards, > > Anastasiia >