Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144182958
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
---
@@ -46,14 +47,20 @@ object ExpressionSet {
* set.contains(1 + a) => true
* set.contains(a + 2) => false
* }}}
+ *
+ * For non-deterministic expressions, they are always considered as not
contained in the [[Set]].
+ * On adding a non-deterministic expression, simply append it to the
original expressions.
+ * This is consistent with how we define `semanticEquals` between two
expressions.
*/
class ExpressionSet protected(
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
extends Set[Expression] {
protected def add(e: Expression): Unit = {
- if (!baseSet.contains(e.canonicalized)) {
+ if (!e.deterministic) {
+ originals += e
+ } else if (!baseSet.contains(e.canonicalized) ) {
--- End diff --
Yeah I have thought about this.
But the time complexity is O(n) for adding an expression.
It is a trade off, I prefer to my current implementation, the time
complexity is O(1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]