Seems fine to me - as good a placeholder as anything.
Would that be about time to call 2.x end-of-life?
On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon wrote:
> Hi all,
>
> Spark 3.2. is out. Shall we update the release window
> https://spark.apache.org/versioning-policy.html?
> I am thinking of
Hi all,
Spark 3.2. is out. Shall we update the release window
https://spark.apache.org/versioning-policy.html?
I am thinking of Mid March 2022 (5 months after the 3.2 release) for code
freeze and onward.
The transform expressions in v2 are logical, not concrete implementations.
Even days may have different implementations -- the only expectation is
that the partitions are day-sized. For example, you could use a transform
that splits days at UTC 00:00, or uses some other day boundary.
Because the
Thanks Wenchen, this is a good question. `BucketTransform` and others
currently have no semantic meaning, and I think we should bind them to v2
functions as part of the SPIP. My current proposal is:
During query analysis, Spark will try to resolve `XXXTransform`s (in
`V2ExpressionUtils`) into
Thanks for the initial feedback.
I think previously the community is busy on the works related to Spark 3.2
release.
As 3.2 release was done, I'd like to bring this up to the surface again and
seek for more discussion and feedback.
Thanks.
On 2021/06/25 15:49:49, huaxin gao wrote:
> I
+1 for the SPIP. This is a great improvement and optimization!
On 2021/10/26 19:01:03, Erik Krogen wrote:
> It's great to see this SPIP going live. Once this is complete, it will
> really help Spark to play nicely with a broader data ecosystem (Hive,
> Iceberg, Trino, etc.), and it's great to
`BucketTransform` is a builtin partition transform in Spark, instead of a
UDF from `FunctionCatalog`. Will Iceberg use UDF from `FunctionCatalog` to
represent its bucket transform, or use the Spark builtin `BucketTransform`?
I'm asking this because other v2 sources may also use the builtin
Two v2 sources may return different bucket IDs for the same value, and this
breaks the phase 1 split-wise join.
This is why the FunctionCatalog included a canonicalName method (docs
IIUC, the general idea is to let each input split report its partition
value, and Spark can perform the join in two phases:
1. join the input splits from left and right tables according to their
partitions values and join keys, at the driver side.
2. for each joined input splits pair (or a group