Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/10757#issuecomment-172296357
A summary of offline discussion with @rxin and @marmbrus:
The reason why @rxin suggested removing all back-ticks in generated column
names is mostly because of backwards compatibility. Those generated column
names are now generated using `Expression.sql` instead of
`Expression.prettyString`. For example, the following DataFrame
```scala
df.selectExpr("id + 1")
```
used to produce a single column named `id + 1`, but now it becomes `` `id`
+ 1``.
However, later on I found that removing back-ticks in generated column
names still cannot guarantee this level of backwards-compatibility. This is
because, although `pettyString` and `sql` are often quite similar, they are
still inherently different from each other in many cases, for example:
Expression | `prettyString` | `sql`
---------- | -------------- | -----
`a && b` | `a && b` | `a AND b`
`a.getField("f")` | `a[f]` | `a.f`
`m.getItem("key")` | `m[key]` | `m["key"]`
Basically we won't be able to replace `prettyString` with `sql` is we do
want this level of backwards-compatibility. However, we may see that the
original `prettyString` method doesn't return proper column names. I tend to
fix them in Spark 2.0. As for back-tick quoting, a utility method
`safeSQLIdent` is added to quote identifiers only when necessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]