dongjoon-hyun commented on code in PR #52853:
URL: https://github.com/apache/spark/pull/52853#discussion_r2551119055
##########
python/pyspark/sql/tests/test_udf_profiler.py:
##########
@@ -575,6 +585,34 @@ def summarize(left, right):
for id in self.profile_results:
self.assert_udf_profile_present(udf_id=id,
expected_line_count_prefix=2)
+ def test_perf_profiler_data_source(self):
Review Comment:
Unfortunately, it turns out that we need skip this test when `pyarrow`
doesn't exist.
- https://github.com/apache/spark/actions/runs/19574648782/job/56056836234
```
======================================================================
ERROR: test_perf_profiler_data_source
(pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py",
line 609, in test_perf_profiler_data_source
self.spark.read.format("TestDataSource").load().collect()
File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 469,
in collect
sock_info = self._jdf.collectToPython()
File
"/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line
1362, in __call__
return_value = get_return_value(
File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line
263, in deco
return f(*a, **kw)
File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py",
line 327, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling
o235.collectToPython.
: org.apache.spark.SparkException:
Error from python worker:
Traceback (most recent call last):
File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in
_run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in
_run_code
exec(code, run_globals)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line
37, in <module>
File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", line
126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in
_find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<builtin>/frozen importlib._bootstrap_external", line 897, in
exec_module
File "<frozen importlib._bootstrap>", line 241, in
_call_with_frames_removed
File
"/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py",
line 21, in <module>
import pyarrow as pa
ModuleNotFoundError: No module named 'pyarrow'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]