[
https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shanyu zhao updated LIVY-750:
-
Description:
On Livy Server, even if we set pyspark archives to use local files:
{code:bash}
export
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}
Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip ->
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip
->
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
Note that this is after we fixed Spark code in SPARK-30845 to not always upload
local archives.
The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles",
which will be added to Yarn distributed cache by Spark. Since spark-submit
already takes care of finding and uploading pyspark archives if it is not
local, there is no need for Livy to redundantly do so.
was:
On Livy Server, even if we set pyspark archives to use local files:
{code:bash}
export
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}
Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip ->
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip
->
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
Note that this is after we fixed Spark code in SPARK-30845 to not always upload
local archives.
The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles",
which will be added to Yarn distributed cache by Spark. Since spark-submit
already takes care of uploading pyspark archives, there is no need for Livy to
redundantly do so.
> Livy uploads local pyspark archives to Yarn distributed cache
> -
>
> Key: LIVY-750
> URL: https://issues.apache.org/jira/browse/LIVY-750
> Project: Livy
> Issue Type: Bug
> Components: Server
>Affects Versions: 0.6.0, 0.7.0
>Reporter: shanyu zhao
>Priority: Major
> Attachments: image-2020-02-16-13-19-40-645.png,
> image-2020-02-16-13-19-59-591.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> On Livy Server, even if we set pyspark archives to use local files:
> {code:bash}
> export
> PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
> {code}
> Livy still upload these local pyspark archives to Yarn distributed cache:
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO
> yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip ->
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO
> yarn.Client: Uploading resource
> file:/opt/spark/python/lib/py4j-0.10.7-src.zip ->
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
> Note that this is after we fixed Spark code in SPARK-30845 to not always
> upload local archives.
> The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles",
> which will be added to Yarn distributed cache by Spark. Since spark-submit
> already takes care of finding and uploading pyspark archives if it is not
> local, there is no need for Livy to redundantly do so.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)