Given you already know your input files (input_file_name), why not getting their size and summing this up?

|import java.io.File ||import java.net.URI|
|import|  org.apache.spark.sql.functions.input_file_name

|ds.select(input_file_name.as("filename")) .distinct.as[String] .map(filename => new File(new URI(filename).getPath).length) .select(sum($"value")) .show()|
||


Enrico


Am 19.06.22 um 03:16 schrieb Yong Walt:
|import java.io.File val someFile = new File("somefile.txt") val fileSize = someFile.length|
This one?

On Sun, Jun 19, 2022 at 4:33 AM mbreuer <msbre...@gmail.com> wrote:

    Hello Community,

    I am working on optimizations for file sizes and number of files.
    In the
    data frame there is a function input_file_name which returns the file
    name. I miss a counterpart to get the size of the file. Just the
    size,
    like "ls -l" returns. Is there something like that?

    Kind regards,
    Markus


    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to