HeartSaVioR commented on issue #27022: [SPARK-28415][DSTREAMS] Add 
messageHandler to Kafka 10 direct stream API #25205
URL: https://github.com/apache/spark/pull/27022#issuecomment-570369499
 
 
   That is same as my understanding actually. Maybe there's some difference 
between Kafka old API and new API, but I'm not familiar with old API so not 
sure. At least from skimming the code on KafkaRDD in kafka-08 module, it also 
seems to store the original data (batch of records) Kafka provides into memory, 
and apply messageHandler just before providing one record; so messageHandler 
may not help to save memory there as well.
   
   @spektom 
   Could you craft the example projects (smaller would be really appreciated) 
for both kafka 08 and 010, and compare memory usage?
   
   And if I understand correctly, if you really want to achieve the goal, the 
patch doesn't seem to be sufficient; you may want to go through 
KafkaDataConsumer and apply transformation in `buffer` which would store 
transformed data into memory and serve them, but I wouldn't think it is easy to 
do safely as KafkaDataConsumer is cached and served for same topicpartition. 
(If you have to apply different messageHandlers for the same topicpartition, 
it'll be messed up.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to