[GitHub] [spark] zhengruifeng commented on pull request #40063: [CONNECT] Eager Execution of DF.sql()

via GitHub Fri, 17 Feb 2023 04:32:58 -0800


zhengruifeng commented on PR #40063:
URL: https://github.com/apache/spark/pull/40063#issuecomment-1434578211


   It seems more complicated than I thought, I think we can simplify it in this 
way
   
   In client:
   ```
       def sql(self, sqlQuery: str, args: Optional[Dict[str, str]] = None) -> 
"DataFrame":
           df = DataFrame.withPlan(SQL(sqlQuery, args), self)
           print(df.schema)   <- eagerly analyze the plan
           return df
   ```
   
   In connect planner:
   ```
     private def transformSql(sql: proto.SQL): LogicalPlan = {
       // scalastyle:off println
       println(s"invoke transformSql $sql")
       session
         .sql(sql.getQuery, sql.getArgsMap.asScala.toMap)    <- directly invoke 
the spark session api
         .logicalPlan
     }
   ```
   
   
   bin/pyspark --remote "local[*]"
   ```
   >>> spark.sql("set spark.sql.adaptive.enabled=false")
   invoke transformSql query: "set spark.sql.adaptive.enabled=false"
   
   StructType([StructField('key', StringType(), False), StructField('value', 
StringType(), False)])
   DataFrame[key: string, value: string]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #40063: [CONNECT] Eager Execution of DF.sql()

Reply via email to