[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

via GitHub Mon, 17 Jul 2023 03:25:54 -0700


zhengruifeng commented on PR #42040:
URL: https://github.com/apache/spark/pull/42040#issuecomment-1637817969


   In https://github.com/apache/spark/pull/39925, we introduced a new mechanism 
to resolve expression with specified plan.
   
   However, sometimes the plan ID might be eliminated by the analyzer, and then 
some expressions can not be correctly resolved, this issue is the No.1 blocker 
of PS on Connect.
   
   Currently, I investigate the two examples [in the 
ticket](https://issues.apache.org/jira/browse/SPARK-43611) and check each rule 
applied to them.
   
   example 1:
   ```
   >>> import pyspark.pandas as ps
   >>> psdf1 = ps.DataFrame({"A": [1, 2, 3]})
   >>> psdf2 = ps.DataFrame({"B": [1, 2, 3]})
   >>> psdf1.append(psdf2)
   ```
   
   example 2:
   ```
   import pyspark.pandas as ps
   import pandas as pd
   
   pdf = pd.DataFrame({"A": [None, 3, None, None], "B": [2, 4, None, 3], "C": 
[None, None, None, 1], "D": [0, 1, 5, 4],}, columns=["A", "B", "C", "D"],)
   psdf = ps.from_pandas(pdf)
   psdf.backfill()
   ```
   
   In the draft, I modify two rules to retain the plan id. (actually, I 
modified 
[ResolveNaturalAndUsingJoin](https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L3302-L3316)
 in 
https://github.com/apache/spark/commit/167bbca49c1c12ccd349d4330862c136b38d4522)
   
   I am wondering whether is there some graceful approach to fix this issue? 
Otherwise, I'm afraid I will touch more rules.
   
   cc @cloud-fan @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

Reply via email to