Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142796899
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
---
@@ -519,3 +519,18 @@ case class CoGroup(
outputObjAttr: Attribute,
left: LogicalPlan,
right: LogicalPlan) extends BinaryNode with ObjectProducer
+
+case class FlatMapGroupsInPandas(
+ groupingAttributes: Seq[Attribute],
+ functionExpr: Expression,
+ output: Seq[Attribute],
+ child: LogicalPlan) extends UnaryNode {
+ /**
+ * This is needed because output attributes is considered `reference`
when
+ * passed through the constructor.
+ *
+ * Without this, catalyst will complain that output attributes are
missing
+ * from the input.
+ */
+ override val producedAttributes = AttributeSet(output)
--- End diff --
This is one of the trick bit.
It's because of this code:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L135
Because of `productIterator` will return all member variables, including
`output`, `references` of the tree node will include all output attributes, and
it will complain about missing input:
```
def missingInput: AttributeSet = references -- inputSet --
producedAttributes
```
I think my solution here isn't great but I don't know the best way of deal
with this. If someone with deeper catalyst knowledge can suggest, I am happy to
give rid of this bit..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]