Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/12008#discussion_r57620239
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1585,3 +1586,110 @@ object ResolveUpCast extends Rule[LogicalPlan] {
}
}
}
+
+/**
+ * Maps a time column to multiple time windows using the Expand operator.
Since it's non-trivial to
+ * figure out how many windows a time column can map to, we over-estimate
the number of windows and
+ * filter out the rows where the time column is not inside the time window.
+ */
+object TimeWindowing extends Rule[LogicalPlan] {
+
+ /**
+ * Depending on the operation, the TimeWindow expression may be wrapped
in an Alias (in case of
+ * projections) or be simply by itself (in case of groupBy),
+ * @param f The function that we want to apply on the TimeWindow
expression
+ * @return The user defined function applied on the TimeWindow expression
+ */
+ private def getWindowExpr[E](f: TimeWindow => E):
PartialFunction[Expression, E] = {
--- End diff --
Usually we would do this as an extractor (an unapply method). I think it
looks a little less magical that way.
However, I'm worried that we are loosing the alias. Can you add a test
like:
```scala
df.groupBy(window(...).as("time")).agg(count("*")).select($"time")
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]