[jira] [Commented] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration

2024-05-24 Thread Ravi Dalal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849399#comment-17849399
 ] 

Ravi Dalal commented on SPARK-48417:


For anyone facing this issue, use following configuration to read file from GCS 
when spark.jars.packages is used:
{code:java}
config("spark.jars", 
"https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar;)
config("spark.hadoop.fs.AbstractFileSystem.gs.impl", 
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")   
config("spark.hadoop.fs.gs.impl", 
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"){code}
When spark.jars.pacakges is not used, following configuration alone works:
{code:java}
config("spark.jars", 
"https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar;)
config("spark.hadoop.fs.AbstractFileSystem.gs.impl", 
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") {code}

> Filesystems do not load with spark.jars.packages configuration
> --
>
> Key: SPARK-48417
> URL: https://issues.apache.org/jira/browse/SPARK-48417
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.5.1
>Reporter: Ravi Dalal
>Priority: Major
> Attachments: pyspark_mleap.py, 
> pyspark_spark_jar_package_config_logs.txt, 
> pyspark_without_spark_jar_package_config_logs.txt
>
>
> When we use spark.jars.packages configuration parameter in Python 
> SparkSession Builder (Pyspark), it appears that the filesystems are not 
> loaded when session starts. Because of this, Spark fails to read file from 
> Google Cloud Storage (GCS) bucket (with GCS Connector). 
> I tested this with different packages so it does not appear specific to a 
> particular package. I will attach the sample code and debug logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration

2024-05-24 Thread Ravi Dalal (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Dalal closed SPARK-48417.
--

> Filesystems do not load with spark.jars.packages configuration
> --
>
> Key: SPARK-48417
> URL: https://issues.apache.org/jira/browse/SPARK-48417
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.5.1
>Reporter: Ravi Dalal
>Priority: Major
> Attachments: pyspark_mleap.py, 
> pyspark_spark_jar_package_config_logs.txt, 
> pyspark_without_spark_jar_package_config_logs.txt
>
>
> When we use spark.jars.packages configuration parameter in Python 
> SparkSession Builder (Pyspark), it appears that the filesystems are not 
> loaded when session starts. Because of this, Spark fails to read file from 
> Google Cloud Storage (GCS) bucket (with GCS Connector). 
> I tested this with different packages so it does not appear specific to a 
> particular package. I will attach the sample code and debug logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration

2024-05-24 Thread Ravi Dalal (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Dalal resolved SPARK-48417.

Resolution: Not A Problem

Apologies. We missed a configuration parameter. Found it after creating this 
bug. Resolving the bug now.

> Filesystems do not load with spark.jars.packages configuration
> --
>
> Key: SPARK-48417
> URL: https://issues.apache.org/jira/browse/SPARK-48417
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.5.1
>Reporter: Ravi Dalal
>Priority: Major
> Attachments: pyspark_mleap.py, 
> pyspark_spark_jar_package_config_logs.txt, 
> pyspark_without_spark_jar_package_config_logs.txt
>
>
> When we use spark.jars.packages configuration parameter in Python 
> SparkSession Builder (Pyspark), it appears that the filesystems are not 
> loaded when session starts. Because of this, Spark fails to read file from 
> Google Cloud Storage (GCS) bucket (with GCS Connector). 
> I tested this with different packages so it does not appear specific to a 
> particular package. I will attach the sample code and debug logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration

2024-05-24 Thread Ravi Dalal (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Dalal updated SPARK-48417:
---
Attachment: pyspark_mleap.py
pyspark_spark_jar_package_config_logs.txt
pyspark_without_spark_jar_package_config_logs.txt

> Filesystems do not load with spark.jars.packages configuration
> --
>
> Key: SPARK-48417
> URL: https://issues.apache.org/jira/browse/SPARK-48417
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.5.1
>Reporter: Ravi Dalal
>Priority: Major
> Attachments: pyspark_mleap.py, 
> pyspark_spark_jar_package_config_logs.txt, 
> pyspark_without_spark_jar_package_config_logs.txt
>
>
> When we use spark.jars.packages configuration parameter in Python 
> SparkSession Builder (Pyspark), it appears that the filesystems are not 
> loaded when session starts. Because of this, Spark fails to read file from 
> Google Cloud Storage (GCS) bucket (with GCS Connector). 
> I tested this with different packages so it does not appear specific to a 
> particular package. I will attach the sample code and debug logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48417) Filesystems do not load with spark.jars.packages configuration

2024-05-24 Thread Ravi Dalal (Jira)
Ravi Dalal created SPARK-48417:
--

 Summary: Filesystems do not load with spark.jars.packages 
configuration
 Key: SPARK-48417
 URL: https://issues.apache.org/jira/browse/SPARK-48417
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 3.5.1
Reporter: Ravi Dalal


When we use spark.jars.packages configuration parameter in Python SparkSession 
Builder (Pyspark), it appears that the filesystems are not loaded when session 
starts. Because of this, Spark fails to read file from Google Cloud Storage 
(GCS) bucket (with GCS Connector). 

I tested this with different packages so it does not appear specific to a 
particular package. I will attach the sample code and debug logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org