mzhang opened a new pull request, #55844:
URL: https://github.com/apache/spark/pull/55844
### What changes were proposed in this pull request?
Lift the restriction on complex types in file source constant metadata
attributes. Before this change, `FileSourceMetadataAttribute.isSupportedType`
rejected `ArrayType`, `MapType`, and `StructType` outright (citing
`ColumnVectorUtils.populate` as the limiting factor), even though the
underlying machinery (`RowToColumnConverter`, the array/map layout in
`OffHeapColumnVector`, and the struct-child scaffolding in
`ConstantColumnVector`) already supports them.
Concretely:
- `FileSourceMetadataAttribute.isSupportedType` now allows complex types
recursively, contingent on their element types being supported.
- `ColumnVectorUtils.populate` gains struct/array/map branches:
- Struct: recurse into pre-allocated child `ConstantColumnVector`s.
- Array/map: allocate a one-row `OffHeapColumnVector` backing and reuse
the existing `RowToColumnConverter` (wrapped in a single-field struct schema)
to write the constant value. The resulting view is handed to the constant
vector along with ownership of the backing.
- `ConstantColumnVector` gains optional ownership of a backing
`WritableColumnVector` (closed by `close()`), exposed via new
`setArrayWithBacking` / `setMapWithBacking` methods. The original `setArray` /
`setMap` are unchanged (caller retains ownership).
- `ConstantColumnVector`'s constructor now pre-allocates struct children so
`populate`'s struct recursion has a target. `setChild` closes any
previously-set child to avoid leaking the auto-allocated one.
### Why are the changes needed?
A natural use case is a file source with metadata such as access control
lists, tags, or per-file annotations whose values are best expressed as arrays
or structs. Today these must be encoded as variants or strings, even though the
column vector implementation can handle the native types.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Updated `ColumnVectorUtilsSuite` to replace the "not supported" cases for
array / map / struct with positive cases that populate the corresponding
constants from `InternalRow` and assert the resulting values, including a
nested `ARRAY<STRUCT<...>>` and a null-array case. Existing
`ConstantColumnVectorSuite` cases continue to exercise the same paths.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]