HyukjinKwon commented on code in PR #38070:
URL: https://github.com/apache/spark/pull/38070#discussion_r985387151
##########
python/pyspark/pandas/namespace.py:
##########
@@ -1049,6 +1049,10 @@ def read_excel(
Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather
than
'X'...'X'. Passing in False will cause data to be overwritten if there
are duplicate names in the columns.
+ .. note:: This process is not case sensitive. If two columns are
spelled the
+ same with different casing then an ambiguity error will arise.
Specifying
+ `spark.conf.set("spark.sql.caseSensitive","true")` will resolve this
issue.
Review Comment:
`spark.sql.caseSensitive` configuration is actually discouraged. I think
this is actually a general issue in pandas API on Spark itself instead of this
specific API. Should probably write down somewhere like
https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/faq.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]