app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

lnxpgn Wed, 09 Aug 2023 19:53:48 -0700

Yes, ls -l /tmp/app-submodules.zip, hdfs dfs -ls /tmp/app-submodules.zipcan show the file.


在 2023/8/9 22:48, Mich Talebzadeh 写道:

If you are running in the cluster mode, that zip file should exist inall the nodes! Is that the case?


HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom

**view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk.Any and all responsibility forany loss, damage or destruction of data or any other property whichmay arise from relying on this email's technical content is explicitlydisclaimed. The author will in no case be liable for any monetarydamages arising from such loss, damage or destruction.




On Wed, 9 Aug 2023 at 13:41, lnxpgn <lnx...@gmail.com> wrote:

    Hi,

    I am using Spark 3.4.1, running on YARN. Hadoop runs on a
    single-node in
    a pseudo-distributed mode.

    spark-submit --master yarn --deploy-mode cluster --py-files
    /tmp/app-submodules.zip app.py

    The YARN application ran successfully, but have a warning log message:

    
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:

    RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]
    specified in 'spark.submit.pyFiles' to Python path:

    If I use HDFS file:

    spark-submit --master yarn --deploy-mode cluster --py-files
    hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py

    the warning message looks like this:

    
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:

    RuntimeWarning: Failed to add file
    [hdfs://hadoop-namenode:9000/app-submodules.zip] specified in
    'spark.submit.pyFiles' to Python path:

    The part code of context.py:

    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    if not os.path.exists(filepath):
         shutil.copyfile(path, filepath)

    Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,
    shutil.copyfile treats them as part of the file name.

    I searched, but didn't find useful information, didn't know why,
    this is
    a bug or I did something wrong?




    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [PySpark] Failed to add file [file:///tmp/app-submodules.zip] specified in 'spark.submit.pyFiles' to Python path:

Reply via email to