Yicong-Huang opened a new pull request, #54356:
URL: https://github.com/apache/spark/pull/54356
### What changes were proposed in this pull request?
This PR refactors `SQL_SCALAR_ARROW_ITER_UDF` processing logic as part of
SPARK-55388 umbrella task. It consolidates the wrapper, mapper, and serializer
logic into a single location.
**Key changes:**
1. Added three reusable verification helper functions (without default
arguments):
- `verify_iterator_consumed()` - Verifies iterator is fully consumed
- `verify_row_limit_iter()` - Fail-fast row limit check with deferred
evaluation support
- `verify_row_count_match_iter()` - Final row count match verification
2. Implemented dedicated mapper function for `SQL_SCALAR_ARROW_ITER_UDF`:
- Streaming argument extraction with nonlocal row counting
- Type verification with `verify_result(pa.Array)`
- Schema enforcement with `ArrowBatchTransformer.enforce_schema`
- Proper iterator validation
3. Unified serializer usage:
- Now uses `ArrowStreamSerializer(write_start_stream=True)`
- Removed direct usage of `ArrowStreamArrowUDFSerializer` from imports
4. Added TODO(SPARK-55579) comments for future Arrow-specific error classes
### Why are the changes needed?
This refactor is part of SPARK-55388 to consolidate PythonEvalType
processing logic. It follows the same pattern as SPARK-55389
(SQL_MAP_ARROW_ITER_UDF), ensuring consistency across Arrow iterator UDF
implementations. The changes make the verification functions reusable by other
eval types and eliminate the need for specialized serializers.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Existing Arrow UDF tests pass: `pyspark.sql.tests.arrow.test_arrow_udf`
- Black formatting and Ruff linting passed
- Verified streaming behavior is maintained (no materialization)
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]