Hi Spark Experts:
I am trying to use a stateful udf with spark structured streaming that needs to
update the state periodically.
Here is the scenario:
1. I have a udf with a variable with default value (eg: 1) This value is
applied to a column (eg: subtract the variable from the column value )2. The
variable is to be updated periodically asynchronously (eg: reading a file every
5 minutes) and the new rows will have the new value applied to the column value.
Spark natively supports broadcast variables, but I could not find a way to
update the broadcasted variables dynamically or rebroadcast them once so that
the udf internal state can be updated while the structure streaming application
is running.
I can try to read the variable from the file on each invocation of the udf but
it will not scale since each invocation open/read/close the file.
Please let me know if there is any documentation/example to support this
scenario.
Thanks