https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
This says "Binary file data source does not support writing a DataFrame back to the original files." which I take to mean this isn't possible... I haven't done this, but going from the docs, it would be: spark.read.format("binaryFile").option("pathGlobFilter", "*.png").load("/path/to/data").write.format("binaryFile").save("/new/path/to/data") Looking at the DataFrameWriter code on master branch <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala> for DataFrameWriter, let's see if there is a binaryFile format option... At this point I get lost. I can't figure out how this works either, but hopefully I have helped define the problem. The format() method of DataFrameWriter isn't documented <https://spark.apache.org/docs/3.1.3/api/java/org/apache/spark/sql/DataFrameWriter.html#format-java.lang.String-> . Russell Jurney @rjurney <http://twitter.com/rjurney> russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com Book a time on Calendly <https://calendly.com/rjurney_personal/30min> On Thu, Mar 9, 2023 at 12:52 AM second_co...@yahoo.com.INVALID <second_co...@yahoo.com.invalid> wrote: > any example on how to read a binary file using pySpark and save it in > another location . copy feature > > > Thank you, > Teoh >