HyukjinKwon commented on a change in pull request #35861:
URL: https://github.com/apache/spark/pull/35861#discussion_r830716928



##########
File path: python/pyspark/instrumentation_utils.py
##########
@@ -44,6 +63,17 @@ def wrapper(*args: Any, **kwargs: Any) -> Any:
             start = time.perf_counter()
             try:
                 res = func(*args, **kwargs)
+                if isinstance(res, AbstractContextManager):
+                    # Wrap AbstractContextManager's subclasses returned by 
@contextmanager decorator
+                    # function so that wrapped function calls inside __enter__ 
and __exit__
+                    # are not recorded by usage logger.
+                    #
+                    # The reason to add a wrapped class after function calls 
instead of
+                    # wrapping __enter__ and __exit__ methods of 
_GeneratorContextManager class is
+                    # because usage logging should be disabled for functions 
with @contextmanager
+                    # decorator in PySpark only.
+                    res = _WrappedAbstractContextManager(res, class_name, 
function_name, logger)

Review comment:
       Oops, sorry I missed this. This breaks few things: 
`pyspark.pandas.frame.CachedDataFrame` with `__enter__` and `__exit__` (and 
potentially `pyspark.SparkContext` and `pyspark.sql.SparkSession`). Note that 
`isinstance(obj, AbstractContextManager)` works for the classes that do not 
inherit `AbstractContextManager` (see also 
https://docs.python.org/3/glossary.html#term-abstract-base-class).
   
   For example, if a notebook has a feature to automatically represent these 
instances above:
   
   ```bash
   KOALAS_USAGE_LOGGER=pyspark.pandas.usage_logger ./bin/pyspark
   ```
   
   This does not work anymore because:
   
   ```python
   >>> import pyspark.pandas as ps
   >>> type(ps.range(1).spark.cache())
   <class 'pyspark.instrumentation_utils._WrappedAbstractContextManager'>
   ```
   
   and ,`_WrappedAbstractContextManager` is not the instance of `DataFrame` 
which `CachedDataFrame` inherits.
   
   We would probably have to monkey-patch each method of `__enter__` and 
`__exit__`. Since the release is getting close, I will revert this one first to 
be more conservative.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to