Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22944#discussion_r231038560
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
---
@@ -262,25 +262,39 @@ object AppendColumns {
def apply[T : Encoder, U : Encoder](
func: T => U,
child: LogicalPlan): AppendColumns = {
+ val outputEncoder = encoderFor[U]
+ val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+ assert(outputEncoder.namedExpressions.length == 1)
+ outputEncoder.namedExpressions.map(Alias(_, "key")())
+ } else {
+ outputEncoder.namedExpressions
+ }
new AppendColumns(
func.asInstanceOf[Any => Any],
implicitly[Encoder[T]].clsTag.runtimeClass,
implicitly[Encoder[T]].schema,
UnresolvedDeserializer(encoderFor[T].deserializer),
- encoderFor[U].namedExpressions,
+ namedExpressions,
child)
}
def apply[T : Encoder, U : Encoder](
func: T => U,
inputAttributes: Seq[Attribute],
child: LogicalPlan): AppendColumns = {
+ val outputEncoder = encoderFor[U]
+ val namedExpressions = if (!outputEncoder.isSerializedAsStruct) {
+ assert(outputEncoder.namedExpressions.length == 1)
+ outputEncoder.namedExpressions.map(Alias(_, "key")())
+ } else {
+ outputEncoder.namedExpressions
--- End diff --
I tried to add check into `CheckAnalysis` to detect such `AppendColumns`. I
found that we have many such use cases in `DatasetSuite`, e.g. `map and group
by with class data`:
```scala
val ds: Dataset[(ClassData, Long)] = Seq(ClassData("one", 1),
ClassData("two", 2)).toDS()
.map(c => ClassData(c.a, c.b + 1))
.groupByKey(p => p).count()
```
If users don't access original output like this patch shows, it won't cause
problem. So I'm thinking if we disallow it at all, it is a behavior change.
Should we only fail the `groupByKey` query accessing ambiguous field names? Or
we should disallow at all if there is any conflicting name?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]