[GitHub] [spark] HyukjinKwon edited a comment on pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon edited a comment on pull request #28661:
URL: https://github.com/apache/spark/pull/28661#issuecomment-635404514


   I actually didn't quite care about it but realised that people actually 
pretty hate the JVM stacktrace in Python exceptions. Maybe it's because you 
(and I .. and most of people in Spark dev ..) are used to Java side.
   
   It reminds me of Holden's talk: ["Debugging PySpark—Or Why is There a JVM 
Stack Trace in My 
Python?"](https://databricks.com/session/debugging-pyspark-or-why-is-there-a-jvm-stack-trace-in-my-python),
 could be one of the references to show users don't quite like it in general.
   
   I also think I should have added some more context in the PR description. 
This PR:
 - Fixes the whitelisted exceptions such as `AnalysisException` which 
usually gives a reasonable and good enough exception message.
 - Handles and adds the exceptions from Python UDFs to the whitelisted 
exceptions. The exceptions from Python UDFs will always have the same JVM 
stacktrace: 
https://github.com/apache/spark/blob/95aec091e4d8a45e648ce84d32d912f585eeb151/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L515
   
   If somewhat arbitrary exceptions like a runtime exception, say, from a 
shuffle or user-defined exceptions happen, there will be no behaviour changes.
   
   Plus, it will still show the full stacktrace in the log files. So I think 
it's okay to remove it from the console. If users want to do a postmortem, they 
can check log files. If they can run it again, they can turn on this runtime 
configuration and execute one more time to see the JVM stacktrace.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon edited a comment on pull request #28661:
URL: https://github.com/apache/spark/pull/28661#issuecomment-635404514


   I actually didn't quite care about it but realised that people actually 
pretty hate the JVM stacktrace in Python exceptions. Maybe it's because you 
(and I .. and most of people in Spark dev ..) are used to Java side.
   
   It reminds me of Holden's talk: ["Debugging PySpark—Or Why is There a JVM 
Stack Trace in My 
Python?"](https://databricks.com/session/debugging-pyspark-or-why-is-there-a-jvm-stack-trace-in-my-python),
 could be one of the references to show users don't quite like it in general.
   
   I also think I should have added some more context in the PR description. 
This PR:
 - Fixes the whitelisted exceptions such as `AnalysisException` which 
usually gives a reasonable and good enough exception message.
 - Handles and adds the exceptions from Python UDFs to the whitelisted 
exceptions.
   
   If somewhat arbitrary exceptions like a runtime exception, say, from a 
shuffle or user-defined exceptions happen, there will be no behaviour changes.
   
   Plus, it will still show the full stacktrace in the log files. So I think 
it's okay to remove it from the console. If users want to do a postmortem, they 
can check log files. If they can run it again, they can turn on this runtime 
configuration and execute one more time to see the JVM stacktrace.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon edited a comment on pull request #28661:
URL: https://github.com/apache/spark/pull/28661#issuecomment-635404514


   I actually didn't quite care about it but realised that people actually 
pretty hate the JVM stacktrace in Python exceptions. Maybe it's because you 
(and I .. and most of people in Spark dev ..) are used to Java side.
   
   It reminds me of Holden's talk: ["Debugging PySpark—Or Why is There a JVM 
Stack Trace in My 
Python?"](https://databricks.com/session/debugging-pyspark-or-why-is-there-a-jvm-stack-trace-in-my-python),
 could be one of the examples.
   
   I also think I should have added some more context in the PR description. 
This PR:
 - Fixes the whitelisted exceptions such as `AnalysisException` which 
usually gives a reasonable and good enough exception message.
 - Handles and adds the exceptions from Python UDFs to the whitelisted 
exceptions.
   
   If somewhat arbitrary exceptions like a runtime exception, say, from a 
shuffle or user-defined exceptions happen, there will be no behaviour changes.
   
   Plus, it will still show the full stacktrace in the log files. So I think 
it's okay to remove it from the console. If users want to do a postmortem, they 
can check log files. If they can run it again, they can turn on this runtime 
configuration and execute one more time to see the JVM stacktrace.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon edited a comment on pull request #28661:
URL: https://github.com/apache/spark/pull/28661#issuecomment-635404514


   I actually didn't quite care about it but realised that people actually 
pretty hate the JVM stacktrace in Python exceptions. Maybe it's because you 
(and I .. and most of people in Spark dev ..) are used to Java side.
   
   It reminds me of Holden's talk: ["Debugging PySpark—Or Why is There a JVM 
Stack Trace in My 
Python?"](https://databricks.com/session/debugging-pyspark-or-why-is-there-a-jvm-stack-trace-in-my-python),
 could be one of the examples.
   
   I also think I should have added some more context in the PR description. 
This PR:
 - Fixes the whitelisted exceptions such as `AnalysisException` which 
usually gives a reasonable exception message.
 - Handles and adds the exceptions from Python UDFs to the whitelisted 
exceptions.
   
   If somewhat arbitrary exceptions like a runtime exception, say, from a 
shuffle or user-defined exceptions happen, there will be no behaviour changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon edited a comment on pull request #28661:
URL: https://github.com/apache/spark/pull/28661#issuecomment-635404514


   I actually didn't quite care about it but realised that people actually 
pretty hate the JVM stacktrace in Python exceptions. Maybe it's because you 
(and I .. and most of people in Spark dev ..) are used to Java side.
   
   It reminds me of Holden's talk: ["Debugging PySpark—Or Why is There a JVM 
Stack Trace in My 
Python?"](https://databricks.com/session/debugging-pyspark-or-why-is-there-a-jvm-stack-trace-in-my-python),
 could be one of the examples.
   
   I also think I should have added some more context in the PR description. 
This PR:
 - Fixes the whitelisted exceptions such as `AnalysisException` which 
usually gives a reasonable and good enough exception message.
 - Handles and adds the exceptions from Python UDFs to the whitelisted 
exceptions.
   
   If somewhat arbitrary exceptions like a runtime exception, say, from a 
shuffle or user-defined exceptions happen, there will be no behaviour changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org