[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38070: [SPARK-38004][PYTHON] Mangle dupe cols documentation

GitBox Wed, 05 Oct 2022 13:07:43 -0700


HyukjinKwon commented on code in PR #38070:
URL: https://github.com/apache/spark/pull/38070#discussion_r985387151



##########
python/pyspark/pandas/namespace.py:
##########
@@ -1049,6 +1049,10 @@ def read_excel(
         Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather 
than
         'X'...'X'. Passing in False will cause data to be overwritten if there
         are duplicate names in the columns.
+        .. note:: This process is not case sensitive. If two columns are 
spelled the
+        same with different casing then an ambiguity error will arise. 
Specifying
+        `spark.conf.set("spark.sql.caseSensitive","true")` will resolve this 
issue.

Review Comment:
   `spark.sql.caseSensitive` configuration is actually discouraged. I think 
this is actually a general issue in pandas API on Spark itself instead of this 
specific API. Should probably write down somewhere like 
https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/faq.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38070: [SPARK-38004][PYTHON] Mangle dupe cols documentation

Reply via email to