cloud-fan commented on a change in pull request #25594:
[SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow
optimization throws an exception per maxResultSize
URL: https://github.com/apache/spark/pull/25594#discussion_r317944103
##########
File path: python/pyspark/sql/tests/test_arrow.py
##########
@@ -421,6 +421,35 @@ def run_test(num_records, num_parts, max_records,
use_delay=False):
run_test(*case)
[email protected](
+ not have_pandas or not have_pyarrow,
+ pandas_requirement_message or pyarrow_requirement_message)
+class MaxResultArrowTests(unittest.TestCase):
+ # These tests are separate as 'spark.driver.maxResultSize' configuration
+ # is a static configuration to Spark context.
+
+ @classmethod
+ def setUpClass(cls):
+ cls.spark = SparkSession.builder \
+ .master("local[4]") \
+ .appName(cls.__name__) \
+ .config("spark.driver.maxResultSize", "10k") \
+ .getOrCreate()
+
+ # Explicitly enable Arrow and disable fallback.
+ cls.spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled",
"false")
Review comment:
I think it's better to have a test that fails with default settings, for
branch 2.4.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]