You can't set a default, but you can replace NULL values with a value
with a simple SQL statement that is applied to the stream.

On Fri, Nov 19, 2021 at 8:28 AM Xiao, Alton <alton.x...@sap.com.invalid>
wrote:

> Hello,
>
> I am struggling with a task that should be super simple:
>
>       I define a structType to load json data from kafka with spark
> structed streaming, and some fields may have no value, how can I set a
> default value for this record?
>
> For example:
>
> *StructType*(
>   *Array*(*StructField*("a", StringType, nullable = true),
>   *StructField*("b", StringType, nullable = true),
>   *StructField*("c", StringType, nullable = true))
> )
>
>
>
> spark
>   .readStream
>   .format("kafka")
>   .option("kafka.bootstrap.servers", "localhost:9092")
>   .option("subscribe", "input-topic")
>   .option("failOnDataLoss", "false")
>   .load()
>
>
>
> df.writeStream
>   .format(format)
>   .option("checkpointLocation", checkpoint)
>   .option("path", path)
>   .outputMode(OutputMode.*Append*)
>   .trigger(*ProcessingTime*("10 seconds"))
>   .start()
>
>
>
> If input data has no b, how can I set a default value(xxx) , only use udf?
>
>
>
>
>
>
>
>
>

Reply via email to