You can't set a default, but you can replace NULL values with a value with a simple SQL statement that is applied to the stream.
On Fri, Nov 19, 2021 at 8:28 AM Xiao, Alton <alton.x...@sap.com.invalid> wrote: > Hello, > > I am struggling with a task that should be super simple: > > I define a structType to load json data from kafka with spark > structed streaming, and some fields may have no value, how can I set a > default value for this record? > > For example: > > *StructType*( > *Array*(*StructField*("a", StringType, nullable = true), > *StructField*("b", StringType, nullable = true), > *StructField*("c", StringType, nullable = true)) > ) > > > > spark > .readStream > .format("kafka") > .option("kafka.bootstrap.servers", "localhost:9092") > .option("subscribe", "input-topic") > .option("failOnDataLoss", "false") > .load() > > > > df.writeStream > .format(format) > .option("checkpointLocation", checkpoint) > .option("path", path) > .outputMode(OutputMode.*Append*) > .trigger(*ProcessingTime*("10 seconds")) > .start() > > > > If input data has no b, how can I set a default value(xxx) , only use udf? > > > > > > > > >