Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

via GitHub Mon, 15 Apr 2024 04:34:03 -0700


itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2056611480


   The difficulty with the previous method was that it was not easy to 
perfectly sync the data between two separately operating TheadLocal, 
`CurrentOrigin` and `PySparkCurrentOrigin`.
   
   After taking deeper look at the structure, I think we may be able to make 
the `CurrentOrigin` more flexible to support PySpark error context instead of 
adding a separate ThreadLocal like `PySparkCurrentOrigin`.
   
   If it works, it seems possible to improve the structure to a more flexible 
while maintaining the existing communication rules between Python and JVM 
without adding helper functions such as PySpark-specific `fn`.
   
   Let me give it a try and create a PR to refactoring the current structure, 
and ping you guys.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

Reply via email to