itholic commented on code in PR #43237:
URL: https://github.com/apache/spark/pull/43237#discussion_r1348301213
##########
python/pyspark/pandas/sql_formatter.py:
##########
@@ -200,7 +201,8 @@ def sql(
try:
sdf = session.sql(formatter.format(query, **kwargs), args)
finally:
- formatter.clear()
+ if not is_remote():
+ formatter.clear()
Review Comment:
Thanks for checking!
Then I think maybe we can choose one of:
1. Support with proper warning that the catalog could be polluted.
e.g.
```
Temp view `_pandas_api_3ee629ad38024b64bb9301b24315fd36` is created when
performing `ps.sql` and it could pollute the catalog list. Please manually
remove the temp view by running
`spark.catalog.dropTempView('_pandas_api_3ee629ad38024b64bb9301b24315fd36')`
when the resulting DataFrame no longer be used.
```
2. Not support with proper note why we currently don't support.
e.g.
```
`ps.sql` currently does not work with pandas-on-Spark object on Spark
Connect because it creates the random temp view that could pollute the catalog
list.
```
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]