Hi
I am trying to explore how I can use UDAF for my use case.
I have something like this in my Structured Streaming Job.
val counts: Dataset[(String, Double)] = events
.withWatermark("timestamp", "30 minutes")
.groupByKey(e => e._2.siteIdentifier + "|" + e._2.sessionId)
.flatMapGroupsWithState(OutputMode.Append(),
GroupStateTimeout.EventTimeTimeout())(updateSessionState)
.map(m=>(s"${m.name}.${m.timestamp}",m.count))
.groupByKey(_._1)
.agg(typed.sum(_._2))
Where I am always using *sum* as my *agg* function.
I would want to to choose the *agg* function depending on the name of the
*metric* which is a *string*
Something like if *name* startsWith *count* use *sum* else if it starts with
*time* use *avg*
Can I do that ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]