Yes, ls -l /tmp/app-submodules.zip, hdfs dfs -ls /tmp/app-submodules.zip can show the file.

在 2023/8/9 22:48, Mich Talebzadeh 写道:
If you are running in the cluster mode, that zip file should exist in all the nodes! Is that the case?

HTH


Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


**view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk.Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.



On Wed, 9 Aug 2023 at 13:41, lnxpgn <lnx...@gmail.com> wrote:

    Hi,

    I am using Spark 3.4.1, running on YARN. Hadoop runs on a
    single-node in
    a pseudo-distributed mode.

    spark-submit --master yarn --deploy-mode cluster --py-files
    /tmp/app-submodules.zip app.py

    The YARN application ran successfully, but have a warning log message:

    
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:

    RuntimeWarning: Failed to add file [file:///tmp/app-submodules.zip]
    specified in 'spark.submit.pyFiles' to Python path:

    If I use HDFS file:

    spark-submit --master yarn --deploy-mode cluster --py-files
    hdfs://hadoop-namenode:9000/tmp/app-submodules.zip app.py

    the warning message looks like this:

    
/opt/hadoop-tmp-dir/nm-local-dir/usercache/bigdata/appcache/application_1691548913900_0002/container_1691548913900_0002_01_000001/pyspark.zip/pyspark/context.py:350:

    RuntimeWarning: Failed to add file
    [hdfs://hadoop-namenode:9000/app-submodules.zip] specified in
    'spark.submit.pyFiles' to Python path:

    The part code of context.py:

    filepath = os.path.join(SparkFiles.getRootDirectory(), filename)
    if not os.path.exists(filepath):
         shutil.copyfile(path, filepath)

    Look like the submitted Python file has 'file:', 'hdfs:' URI schemes,
    shutil.copyfile treats them as part of the file name.

    I searched, but didn't find useful information, didn't know why,
    this is
    a bug or I did something wrong?




    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to