vinodkc commented on code in PR #53376:
URL: https://github.com/apache/spark/pull/53376#discussion_r2654350819
##########
python/pyspark/sql/functions/builtin.py:
##########
@@ -13091,6 +13091,147 @@ def timestamp_add(unit: str, quantity:
"ColumnOrName", ts: "ColumnOrName") -> Co
)
+@_try_remote_functions
+def timestamp_bucket(
+ bucket_width: "ColumnOrName", timestamp: "ColumnOrName", origin:
Optional["ColumnOrName"] = None
+) -> Column:
+ """
+ Returns the start of the timestamp bucket containing the input timestamp.
+
+ Buckets are fixed-width intervals aligned to a specified origin (default:
Unix epoch).
+ This function supports arbitrary interval bucketing of dates and
timestamps.
+
+ .. versionadded:: 4.2.0
+
+ Parameters
+ ----------
+ bucket_width : :class:`~pyspark.sql.Column` or column name
+ A day-time interval expression specifying the width of each bucket.
+ Use ``sf.expr("INTERVAL '1' HOUR")`` for interval literals.
+ Must be a constant/foldable positive interval.
+ timestamp : :class:`~pyspark.sql.Column` or column name
+ The temporal value to bucket. Accepts:
+
+ - DATE: Implicitly cast to TIMESTAMP at midnight UTC
+ - TIMESTAMP: Used directly (timezone-aware)
+ - TIMESTAMP_NTZ: Used directly (no timezone)
+ origin : :class:`~pyspark.sql.Column` or column name, optional
+ The timestamp to align buckets to. Defaults to Unix epoch
+ (1970-01-01 00:00:00 UTC). Use this to customize bucket alignment:
+
+ - Monday weeks: ``sf.expr("TIMESTAMP'1970-01-05 00:00:00'")``
+ - Sunday weeks: ``sf.expr("TIMESTAMP'1970-01-04 00:00:00'")``
+ - Fiscal year starts: ``sf.expr("TIMESTAMP'2024-04-01 00:00:00'")``
+
+ Must be a constant TIMESTAMP expression.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ The start timestamp of the bucket (TIMESTAMP type).
+ Always returns TIMESTAMP regardless of input types.
+
+ Notes
+ -----
+ - Bucket boundaries are aligned to the specified origin
+ - When origin is not specified, defaults to Unix epoch (1970-01-01
00:00:00 UTC)
+ - The return type is always TIMESTAMP (not TIMESTAMP_NTZ)
Review Comment:
There was no reason. Now I modified the code to preserve timestamp_ntz
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]