anew commented on code in PR #55493: URL: https://github.com/apache/spark/pull/55493#discussion_r3270618887
########## docs/declarative-pipelines-programming-guide.md: ########## @@ -180,6 +180,20 @@ Your pipelines implemented with the Python API must import this module. It's rec from pyspark import pipelines as dp ``` +### The Spark Session in Python Pipelines + +The Spark session is automatically injected by the pipeline framework and is available as `spark` in every Python pipeline file — no initialization code is required. You can use `spark` directly without importing or constructing a `SparkSession`: + +```python +from pyspark import pipelines as dp + [email protected]_view +def my_view(): + return spark.range(10) +``` + +Previous versions of Declarative Pipelines required explicitly assigning the session with `spark = SparkSession.active()` at the top of each pipeline file. This is still allowed and continues to work correctly. However, if you do assign the session explicitly, `SparkSession.active()` is the only supported way to do so — any other method of obtaining or constructing a `SparkSession` is unsupported and may lead to unexpected behavior. Review Comment: Good suggestion, I made the change, however, I was hoping that this could go into 4.2 and worded accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
