Here is Spark's API definition, unable to understand what does it mean to have
"unknown" file. We are processing file we will have fileName.... I have 7
files it can print 3 and miss other 4
/** * Returns the holding file name or empty string if it is unknown.
*/ def getInputFilePath: UTF8String = inputBlock.get().get().filePath
Can anyone help me understand what does it mean by file name unknown hence
above API returning blank filename below.
On Monday, October 11, 2021, 08:43:42 PM PDT, Alchemist
<alchemistsrivast...@gmail.com> wrote:
Hello all,
I am trying to extract file name like following but intermittently we are
getting empty file name.
Step 1: Get SchemaStructType jsonSchema = sparkSession.read()
.option("multiLine", true) .json("src/main/resources/sample.json")
.schema();Step2: Get Input DataSetDataset<Row> inputDS = sparkSession
.readStream() .format("text") .option("multiLine", true) .schema(jsonSchema)
.json(inputPath + "/*");Step3: Add fileName columnDataset<Row> inputDf=
inputDS.select(functions.col("Report")).toJSON() .withColumn("FileName",
org.apache.spark.sql.functions.input_file_name());Step4: Print
fileNameDataset<InputReportAndFileName> inputDF = inputDf
.as(ExpressionEncoder.javaBean(InputReportAndFileName.class)).map((MapFunction<InputReportAndFileName,
InputReportAndFileName>) inputReportAndFileName ->{
System.out.println("&&&&&&&&&&&&& InputReportAndFileName fileName " +
inputReportAndFileName.getFileName()); return inputReportAndFileName;},
ExpressionEncoder.javaBean(InputReportAndFileName.class));
Output: Here we see missing fileName&&&&&&&&&&&&& InputReportAndFileName
fileName &&&&&&&&&&&&& InputReportAndFileName fileName
file:///Users/abc/Desktop/test/Streaming/2021-Aug-14-042000_001E46_1420254%202&&&&&&&&&&&&&
InputReportAndFileName fileName
file:///Users/abc/Desktop/test/Streaming/2021-Aug-14-042000_001E46_14202040&&&&&&&&&&&&&
InputReportAndFileName fileName
file:///Users/abc/Desktop/test/Streaming/2021-Aug-14-042000_001E46_142720%202&&&&&&&&&&&&&
InputReportAndFileName fileName &&&&&&&&&&&&& InputReportAndFileName
fileName &&&&&&&&&&&&& InputReportAndFileName fileName