Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12008#discussion_r57637956
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2551,6 +2551,134 @@ object functions {
         ToUTCTimestamp(ts.expr, Literal(tz))
       }
     
    +  /**
    +   * Bucketize rows into one or more time windows given a timestamp 
specifying column. Window
    +   * starts are inclusive but the window ends are exclusive, e.g. 12:05 
will be in the window
    +   * [12:05,12:10) but not in [12:00,12:05). The following example takes 
the average stock price
    +   * for a one minute window every 10 seconds starting 5 seconds after the 
hour:
    +   *
    +   * {{{
    +   *   val df = ... // schema => timestamp: TimestampType, stockId: 
StringType, price: DoubleType
    +   *   df.groupBy(window($"time", "1 minute", "10 seconds", "5 seconds"), 
$"stockId")
    +   *     .agg(mean("price"))
    +   * }}}
    +   *
    +   * The windows will look like:
    +   *
    +   * {{{
    +   *   09:00:05-09:01:05
    +   *   09:00:15-09:01:15
    +   *   09:00:25-09:01:25 ...
    +   * }}}
    +   *
    +   * For a continuous query, you may use the function `current_timestamp` 
to generate windows on
    +   * processing time.
    +   *
    +   * @param timeColumn The column or the expression to use as the 
timestamp for windowing by time.
    +   *                   The time can be as TimestampType or LongType, 
however when using LongType,
    +   *                   the time must be given in seconds.
    +   * @param windowDuration A string specifying the width of the window, 
e.g. `10 minutes`,
    +   *                       `1 second`. Check 
[[org.apache.spark.unsafe.types.CalendarInterval]] for
    +   *                       valid duration identifiers.
    +   * @param slideDuration A string specifying the sliding interval of the 
window, e.g. `1 minute`.
    +   *                      A new window will be generated every 
`slideDuration`. Must be less than
    +   *                      or equal to the `windowDuration`. Check
    +   *                      
[[org.apache.spark.unsafe.types.CalendarInterval]] for valid duration
    +   *                      identifiers.
    +   * @param startTime The offset with respect to 1970-01-01 00:00:00 UTC 
with which to start
    +   *                  window intervals. For example, in order to have 
hourly tumbling windows that
    +   *                  start 15 minutes past the hour, e.g. 12:15-13:15, 
13:15-14:15... provide
    +   *                  `startTime` as `15 minutes`.
    +   *
    +   * @group datetime_funcs
    +   * @since 2.0.0
    +   */
    +  def window(
    +      timeColumn: Column,
    +      windowDuration: String,
    +      slideDuration: String,
    +      startTime: String): Column = withExpr {
    +    TimeWindow(timeColumn.expr, windowDuration, slideDuration, startTime)
    --- End diff --
    
    +1 on parsing it here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to