Re: [PySpark] [Beginner] [Debug] Does Spark ReadStream support reading from a MinIO bucket?

刘唯 Tue, 05 Aug 2025 15:35:03 -0700

This is not necessarily about the readStream / read API. As long as you
correctly imported the needed dependencies and set up spark config, you
should be able to readStream from s3 path.


See
https://stackoverflow.com/questions/46740670/no-filesystem-for-scheme-s3-with-pyspark

Kleckner, Jade <jade.kleck...@ipp.mpg.de> 于2025年8月5日周二 10:21写道：

> Hello all,
>
>
>
> I’m developing a pipeline to possibly read a stream from a MinIO bucket.
> I have no issues setting Hadoop s3a variables and reading files but when I
> try to create a bucket for Spark to use as a readStream location it
> produces the following errors:
>
>
>
> Example code: initDF = spark.readStream.schema(tempschema).option("path",
> "s3://bucketname").load()
>
>
>
> The below I have used for the bucket path:
>
>
>
> s3 -> py4j.protocol.Py4JJavaError: An error occurred while calling
> o436.load.
>
> : org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for
> scheme "s3"
>
>
>
> s3a -> pyspark.errors.exceptions.captured.IllegalArgumentException: path
> must be absolute
>
>
>
> Absolute path ->
> pyspark.errors.exceptions.captured.UnsupportedOperationException: None
>
>
>
> I’m curious if readStream has any support for s3 buckets at all?  Any
> help/guidance would be appreciated, thank you for your time.
>
>
>
> Sincerely,
>
> Jade Kleckner
>

Re: [PySpark] [Beginner] [Debug] Does Spark ReadStream support reading from a MinIO bucket?

Reply via email to