Github user tiangolo commented on the pull request:

    https://github.com/apache/spark/pull/10948#issuecomment-181507612
  
    So, to be absolutely sure about which parameters work and which don't, I 
ran a (manual) exhaustive search on the "parameter space". I tried all possible 
combinations of the related parameters to see what works and what doesn't.
    
    The minimum set of parameters needed for it to work were: 
`--driver-class-path` and `--jars`.
    
    `--conf spark.executor.extraClassPath=` wasn't needed for the JDBC save to 
work.
    
    I already updated the PR to use just those two parameters. I also updated 
the jar in the example to a more recent Postgres jar (the one I used in my 
tests).
    
    Here's the table of the combinations of parameters and the results of what 
worked and what didn't:
    
    | Works with parameters: | --conf spark.executor.extraClassPath= | 
--driver-class-path | --jars |
    | --- | --- | --- | --- |
    | no | 1 | 0 | 0 |
    | no | 0 | 1 | 0 |
    | no | 0 | 0 | 1 |
    | no | 1 | 0 | 1 |
    | no | 1 | 1 | 0 |
    | yes | 0 | 1 | 1 |
    | yes | 1 | 1 | 1 |
    
    I tried it all with PySpark using IPython (Anaconda Python) in a YARN 
cluster (of actually just one node). I saved the results in a PostgreSQL DB 
running as a Docker container in the same node with a custom port. 
    
    As I did my tests in Python, if it works differently for a Scala shell, we 
would need to have separate documentation for the Scala and Python shells.
    
    Here's the full list of commands tested to start the PySpark shell and what 
worked and what didn't:
    
    Doesn't work:
    
    ``````
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --conf 
spark.executor.extraClassPath=/home/senseta/postgresql-9.4.1207.jar
    ```
    
    Doesn't work:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --driver-class-path /home/senseta/postgresql-9.4.1207.jar
    ```
    
    Doesn't work:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --jars /home/senseta/postgresql-9.4.1207.jar
    ```
    
    Doesn't work:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --conf 
spark.executor.extraClassPath=/home/senseta/postgresql-9.4.1207.jar --jars 
/home/senseta/postgresql-9.4.1207.jar
    ```
    Doesn't work:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --conf 
spark.executor.extraClassPath=/home/senseta/postgresql-9.4.1207.jar 
--driver-class-path /home/senseta/postgresql-9.4.1207.jar
    ```
    
    Works:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --driver-class-path /home/senseta/postgresql-9.4.1207.jar 
--jars /home/senseta/postgresql-9.4.1207.jar 
    ```
    
    Works:
    
    ```
    PATH=/opt/miniconda/bin:$PATH IPYTHON=1 
PYSPARK_PYTHON=/opt/miniconda/bin/python /opt/spark/spark*/bin/pyspark --master 
yarn-client --num-executors 3 --executor-cores 1 --executor-memory 1G 
--driver-memory 2G --conf 
spark.executor.extraClassPath=/home/senseta/postgresql-9.4.1207.jar 
--driver-class-path  /home/senseta/postgresql-9.4.1207.jar --jars 
/home/senseta/postgresql-9.4.1207.jar
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to