Github user jsnowacki commented on the issue:
https://github.com/apache/spark/pull/19443
@HyukjinKwon Thanks for pointing that out. I think the argument about
consistency here is valid, though, I agree with @jaceklaskowski that changes
should go one way or the other, i.e. allow string column names or remove this
option completely. I don't really think that is ambiguous, as functions in SQL
should either accept column object or column name as string, with the exception
of `lit`; well, and `col` which accepts only string. Ambiguous functions like
`concat` should always check for type and if it's a string, force change it to
`Column`. This enforces usage of string only as the column name, and `lit` if
it is an actual string literal.
Also, in case of Python the argument is also a little bit different. We
need to take into account that many objects like `dict` or Pandas' `DataFrame`
made addressing columns by string name more Pythonic way of dealing with
columns. Thus, Python (and to some extent SQL and R) users expect to be able to
use columns by their string names as using a special object for column is a bit
more Java (and, thus, Scala) way of looking at things. Bear in mind, that a lot
of users of these interfaces are not necessarily technical and strict `Column`
usage argument is a bit alien to them. Thus, I would argue that even if
`Column` argument would be enforced in Java and Scala API, other APIs should
keep the by-column-name call possibile, as it is now done in Python, i.e. by
mapping the string names into `Column`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]