https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html

This says "Binary file data source does not support writing a DataFrame
back to the original files." which I take to mean this isn't possible...

I haven't done this, but going from the docs, it would be:

spark.read.format("binaryFile").option("pathGlobFilter",
"*.png").load("/path/to/data").write.format("binaryFile").save("/new/path/to/data")

Looking at the DataFrameWriter code on master branch
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala>
for DataFrameWriter, let's see if there is a binaryFile format option...

At this point I get lost. I can't figure out how this works either, but
hopefully I have helped define the problem. The format() method of
DataFrameWriter isn't documented
<https://spark.apache.org/docs/3.1.3/api/java/org/apache/spark/sql/DataFrameWriter.html#format-java.lang.String->
.

Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com Book a time on Calendly
<https://calendly.com/rjurney_personal/30min>


On Thu, Mar 9, 2023 at 12:52 AM second_co...@yahoo.com.INVALID
<second_co...@yahoo.com.invalid> wrote:

> any example on how to read a binary file using pySpark and save it in
> another location . copy feature
>
>
> Thank you,
> Teoh
>

Reply via email to