Re: [PR] [SPARK-43664][CONNECT][PS] Fix `ps.sql` for remote session [spark]

via GitHub Thu, 05 Oct 2023 23:41:13 -0700


itholic commented on code in PR #43237:
URL: https://github.com/apache/spark/pull/43237#discussion_r1348301213



##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -200,7 +201,8 @@ def sql(
     try:
         sdf = session.sql(formatter.format(query, **kwargs), args)
     finally:
-        formatter.clear()
+        if not is_remote():
+            formatter.clear()

Review Comment:
   Thanks for checking!
   
   Then I think maybe we can choose one of:
   1. Support with proper warning that the catalog could be polluted.
   e.g.
   ```
   Temp view `_pandas_api_3ee629ad38024b64bb9301b24315fd36` is created when 
performing `ps.sql` and it could pollute the catalog list. Please manually 
remove the temp view by running 
`spark.catalog.dropTempView('_pandas_api_3ee629ad38024b64bb9301b24315fd36')` 
when the resulting DataFrame no longer be used.
   ```
   2. Not support with proper note why we currently don't support.
   e.g.
   ```
   `ps.sql` currently does not work with pandas-on-Spark object on Spark 
Connect because it creates the random temp view that could pollute the catalog 
list.
   ```
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-43664][CONNECT][PS] Fix `ps.sql` for remote session [spark]

Reply via email to