We got Kafka->Flume->Kite Dataset sink configured to write to Hive backed dataset. One of the main requirements for us is to do some sessionization on the data and do funnel analysis.
We are currently handling this relying on Impala/Hive but its quite slow and given that we want the reports to be updated quite frequently, it doesn't seem to be scaling well. Wanted to know if there is any way to intercept the Flume events and do some sessionzation and concat the events before writing to the dataset. If so, how would one go about holding onto a state for each user session, etc? Thanks!
