[GitHub] [spark] rangadi opened a new pull request, #40586: [SPARK-42939] Core streaming Python API for Spark Connect

via GitHub Tue, 28 Mar 2023 23:43:38 -0700


rangadi opened a new pull request, #40586:
URL: https://github.com/apache/spark/pull/40586


   ### What changes were proposed in this pull request?
   This adds core streaming API support for Spark Connect. With this, we can 
run majority of streaming queries. All the sources and syncs are supported. 
Most of the aggregations are supported. 
   Examples of features that are not yet supported: APIs that run user codes 
like streaming listener, `foreachBatch()` API etc. 
   
   The remaining missing APIs will be added soon. 
   
   How to try it in local mode (`./bin/pyspark --remote "local[*]"`):
   
   ```
   >>> 
   >>> query = ( 
   ...   spark
   ...     .readStream
   ...     .format("rate")
   ...     .option("numPartitions", "1")
   ...     .load()
   ...     .withWatermark("timestamp", "1 minute")
   ...     .groupBy(window("timestamp", "10 seconds"))
   ...     .count() # count for each 10 sedonds.
   ...     .writeStream
   ...     .format("memory")
   ...     .queryName("rate_table")
   ...     .trigger(processingTime="10 seconds")
   ...     .start()
   ... )
   >>>
   >>> query.isActive
   True
   >>> 
   >>> >>> spark.sql("select window.start, count from rate_table").show()
   +-------------------+-----+
   |              start|count|
   +-------------------+-----+
   |2023-03-11 22:45:40|    6|
   |2023-03-11 22:45:50|   10|
   +-------------------+-----+
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   This is needed to run streaming queries over Spark Connect. 
   
   ### How was this patch tested?
    - Manually tested. 
    - We will enable most of streaming python tests in follow up PRs.
    - 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rangadi opened a new pull request, #40586: [SPARK-42939] Core streaming Python API for Spark Connect

Reply via email to