HyukjinKwon commented on a change in pull request #29639: URL: https://github.com/apache/spark/pull/29639#discussion_r484173595
########## File path: python/docs/source/development/debugging.rst ########## @@ -0,0 +1,280 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +================= +Debugging PySpark +================= + +PySpark uses Spark as an engine. PySpark uses `Py4J <https://www.py4j.org/>`_ to leverage Spark to submit and computes the jobs. + +On the driver side, PySpark communicates with the driver on JVM by using `Py4J <https://www.py4j.org/>`_. +When :class:`pyspark.sql.SparkSession` or :class:`pyspark.SparkContext` is created and initialized, PySpark launches a JVM +to communicate. + +On the executor side, Python workers execute and handle Python native functions or data. They are not launched if +a PySpark application does not require interaction between Python workers and JVMs. They are lazily launched only when +Python native functions or data have to be handled, for example, when you execute pandas UDFs or +PySpark RDD APIs. + +This page focuses on debugging Python side of PySpark on both driver and executor sides instead of focusing on debugging +with JVM. Profiling and debugging JVM is described at `Useful Developer Tools <https://spark.apache.org/developer-tools.html>`_. + +Note that, + +- If you are running locally, you can directly debug the driver side via using your IDE without the remote debug feature. Review comment: BTW, @itholic is working on documenting local PyCharm setup in another page (at SPARK-32189). We could add a link here once that page is finished. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
