dongjoon-hyun commented on code in PR #52853:
URL: https://github.com/apache/spark/pull/52853#discussion_r2551119055


##########
python/pyspark/sql/tests/test_udf_profiler.py:
##########
@@ -575,6 +585,34 @@ def summarize(left, right):
         for id in self.profile_results:
             self.assert_udf_profile_present(udf_id=id, 
expected_line_count_prefix=2)
 
+    def test_perf_profiler_data_source(self):

Review Comment:
   Unfortunately, it turns out that we need skip this test when `pyarrow` 
doesn't exist.
   - https://github.com/apache/spark/actions/runs/19574648782/job/56056836234
   
   ```
   ======================================================================
   ERROR: test_perf_profiler_data_source 
(pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py", 
line 609, in test_perf_profiler_data_source
       self.spark.read.format("TestDataSource").load().collect()
     File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 469, 
in collect
       sock_info = self._jdf.collectToPython()
     File 
"/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 
1362, in __call__
       return_value = get_return_value(
     File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 
263, in deco
       return f(*a, **kw)
     File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", 
line 327, in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o235.collectToPython.
   : org.apache.spark.SparkException: 
   Error from python worker:
     Traceback (most recent call last):
       File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in 
_run_module_as_main
         return _run_code(code, main_globals, None,
       File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in 
_run_code
         exec(code, run_globals)
       File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 
37, in <module>
       File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", line 
126, in import_module
         return _bootstrap._gcd_import(name[level:], package, level)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
       File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
       File "<frozen importlib._bootstrap>", line 1006, in 
_find_and_load_unlocked
       File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
       File "<builtin>/frozen importlib._bootstrap_external", line 897, in 
exec_module
       File "<frozen importlib._bootstrap>", line 241, in 
_call_with_frames_removed
       File 
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py",
 line 21, in <module>
         import pyarrow as pa
     ModuleNotFoundError: No module named 'pyarrow'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to