[GitHub] [spark] gaborgsomogyi opened a new pull request #30592: [SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side

GitBox Thu, 03 Dec 2020 04:45:24 -0800


gaborgsomogyi opened a new pull request #30592:
URL: https://github.com/apache/spark/pull/30592



   ### What changes were proposed in this pull request?
   `spark.buffer.size` not applied in driver from pyspark. In this PR I've 
fixed this issue.
   
   ### Why are the changes needed?
   Apply the mentioned config on driver side.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Existing unit tests + manually.
   
   Added the following code temporarily:
   ```
   def local_connect_and_auth(port, auth_secret):
   ...
               sock.connect(sa)
               print("SPARK_BUFFER_SIZE: %d" % 
int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) <- This is the addition
               sockfile = sock.makefile("rwb", 
int(os.environ.get("SPARK_BUFFER_SIZE", 65536)))
   ...
   ```
   
   Test:
   ```
   #Compile Spark
   
   echo "spark.buffer.size 10000" >> conf/spark-defaults.conf
   $ ./bin/pyspark 
   Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
   [Clang 11.0.3 (clang-1103.0.32.62)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   20/12/03 13:38:13 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   20/12/03 13:38:14 WARN SparkEnv: I/O encryption enabled without RPC 
encryption: keys will be visible on the wire.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
         /_/
   
   Using Python version 3.8.5 (default, Jul 21 2020 10:48:26)
   Spark context Web UI available at http://192.168.0.189:4040
   Spark context available as 'sc' (master = local[*], app id = 
local-1606999094506).
   SparkSession available as 'spark'.
   >>> sc.setLogLevel("TRACE")
   >>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect()
   ...
   SPARK_BUFFER_SIZE: 10000
   ...
   [[0], [2], [3], [4], [6]]
   >>> 
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi opened a new pull request #30592: [SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side

Reply via email to