[jira] [Commented] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration
[ https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849399#comment-17849399 ] Ravi Dalal commented on SPARK-48417: For anyone facing this issue, use following configuration to read file from GCS when spark.jars.packages is used: {code:java} config("spark.jars", "https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar;) config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"){code} When spark.jars.pacakges is not used, following configuration alone works: {code:java} config("spark.jars", "https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar;) config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") {code} > Filesystems do not load with spark.jars.packages configuration > -- > > Key: SPARK-48417 > URL: https://issues.apache.org/jira/browse/SPARK-48417 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.5.1 >Reporter: Ravi Dalal >Priority: Major > Attachments: pyspark_mleap.py, > pyspark_spark_jar_package_config_logs.txt, > pyspark_without_spark_jar_package_config_logs.txt > > > When we use spark.jars.packages configuration parameter in Python > SparkSession Builder (Pyspark), it appears that the filesystems are not > loaded when session starts. Because of this, Spark fails to read file from > Google Cloud Storage (GCS) bucket (with GCS Connector). > I tested this with different packages so it does not appear specific to a > particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration
[ https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Dalal closed SPARK-48417. -- > Filesystems do not load with spark.jars.packages configuration > -- > > Key: SPARK-48417 > URL: https://issues.apache.org/jira/browse/SPARK-48417 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.5.1 >Reporter: Ravi Dalal >Priority: Major > Attachments: pyspark_mleap.py, > pyspark_spark_jar_package_config_logs.txt, > pyspark_without_spark_jar_package_config_logs.txt > > > When we use spark.jars.packages configuration parameter in Python > SparkSession Builder (Pyspark), it appears that the filesystems are not > loaded when session starts. Because of this, Spark fails to read file from > Google Cloud Storage (GCS) bucket (with GCS Connector). > I tested this with different packages so it does not appear specific to a > particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration
[ https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Dalal resolved SPARK-48417. Resolution: Not A Problem Apologies. We missed a configuration parameter. Found it after creating this bug. Resolving the bug now. > Filesystems do not load with spark.jars.packages configuration > -- > > Key: SPARK-48417 > URL: https://issues.apache.org/jira/browse/SPARK-48417 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.5.1 >Reporter: Ravi Dalal >Priority: Major > Attachments: pyspark_mleap.py, > pyspark_spark_jar_package_config_logs.txt, > pyspark_without_spark_jar_package_config_logs.txt > > > When we use spark.jars.packages configuration parameter in Python > SparkSession Builder (Pyspark), it appears that the filesystems are not > loaded when session starts. Because of this, Spark fails to read file from > Google Cloud Storage (GCS) bucket (with GCS Connector). > I tested this with different packages so it does not appear specific to a > particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration
[ https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Dalal updated SPARK-48417: --- Attachment: pyspark_mleap.py pyspark_spark_jar_package_config_logs.txt pyspark_without_spark_jar_package_config_logs.txt > Filesystems do not load with spark.jars.packages configuration > -- > > Key: SPARK-48417 > URL: https://issues.apache.org/jira/browse/SPARK-48417 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 3.5.1 >Reporter: Ravi Dalal >Priority: Major > Attachments: pyspark_mleap.py, > pyspark_spark_jar_package_config_logs.txt, > pyspark_without_spark_jar_package_config_logs.txt > > > When we use spark.jars.packages configuration parameter in Python > SparkSession Builder (Pyspark), it appears that the filesystems are not > loaded when session starts. Because of this, Spark fails to read file from > Google Cloud Storage (GCS) bucket (with GCS Connector). > I tested this with different packages so it does not appear specific to a > particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration
Ravi Dalal created SPARK-48417: -- Summary: Filesystems do not load with spark.jars.packages configuration Key: SPARK-48417 URL: https://issues.apache.org/jira/browse/SPARK-48417 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 3.5.1 Reporter: Ravi Dalal When we use spark.jars.packages configuration parameter in Python SparkSession Builder (Pyspark), it appears that the filesystems are not loaded when session starts. Because of this, Spark fails to read file from Google Cloud Storage (GCS) bucket (with GCS Connector). I tested this with different packages so it does not appear specific to a particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org