spektom opened a new pull request #27022: [SPARK-28415][DSTREAMS] Add 
messageHandler to Kafka 10 direct stream API #25205
URL: https://github.com/apache/spark/pull/27022
 
 
   ## What changes were proposed in this pull request?
   
   This patch introduces messageHandler parameter that can be provided to Kafka 
DStream, which allows processing events received from Kafka at an early stage.
   
   Lack of messageHandler parameter to KafkaUtils.createDirectStrem(...) in the 
new Kafka 10 API is what prevents us from upgrading our processes to use it, 
and here's why:
   
   1. messageHandler() allowed parsing / filtering / projecting huge JSON files 
at an early stage (only a small subset of JSON fields is required for a 
process), without this current cluster configuration doesn't keep up with the 
traffic.
   2. Transforming Kafka events right after a stream is created prevents from 
using HasOffsetRanges interface later. This means that whole message must be 
propagated to the end of a pipeline, which is very ineffective.
   
   ## How was this patch tested?
   
   Unit tests are provided.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to