viirya commented on a change in pull request #33691:
URL: https://github.com/apache/spark/pull/33691#discussion_r687063401



##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1134,6 +1134,73 @@ sessionizedCounts = events \
 </div>
 </div>
 
+Instead of static value, we can also provide an expression to specify gap 
duration dynamically
+based on the input row. Note that the rows with negative or zero gap duration 
will be filtered
+out from the aggregation.
+
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
+
+{% highlight scala %}
+import spark.implicits._
+
+val events = ... // streaming DataFrame of schema { timestamp: Timestamp, 
userId: String }
+
+val sessionWindow = SessionWindow($"timestamp".expr, when($"userId" === 
"user1", "5 seconds")
+  .when($"userId" === "user2", "20 seconds")
+  .otherwise("5 minutes")
+
+// Group the data by session window and userId, and compute the count of each 
group
+val sessionizedCounts = events
+    .withWatermark("timestamp", "10 minutes")
+    .groupBy(
+        Column(sessionWindow),
+        $"userId")
+    .count()
+{% endhighlight %}
+
+</div>
+<div data-lang="java"  markdown="1">
+
+{% highlight java %}
+Dataset<Row> events = ... // streaming DataFrame of schema { timestamp: 
Timestamp, userId: String }
+
+SessionWindow sessionWindow = new SessionWindow(col("timestamp").expr, 
when(col("userId").equalTo("user1"), "5 seconds")
+  .when(col("userId").equalTo("user2"), "20 seconds")
+  .otherwise("5 minutes"))
+
+// Group the data by session window and userId, and compute the count of each 
group
+Dataset<Row> sessionizedCounts = events
+    .withWatermark("timestamp", "10 minutes")
+    .groupBy(
+        new Column(sessionWindow),
+        col("userId"))
+    .count();
+{% endhighlight %}
+
+</div>
+<div data-lang="python"  markdown="1">
+{% highlight python %}
+from pyspark.sql import functions as F
+
+events = ...  # streaming DataFrame of schema { timestamp: Timestamp, userId: 
String }
+
+session_window = session_window(events.timestamp, \

Review comment:
       Oh, I don't realize python `session_window` doesn't accept an 
expression...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to