HeartSaVioR commented on issue #27022: [SPARK-28415][DSTREAMS] Add messageHandler to Kafka 10 direct stream API #25205 URL: https://github.com/apache/spark/pull/27022#issuecomment-570369499 That is same as my understanding actually. Maybe there's some difference between Kafka old API and new API, but I'm not familiar with old API so not sure. At least from skimming the code on KafkaRDD in kafka-08 module, it also seems to store the original data (batch of records) Kafka provides into memory, and apply messageHandler just before providing one record; so messageHandler may not help to save memory there as well. @spektom Could you craft the example projects (smaller would be really appreciated) for both kafka 08 and 010, and compare memory usage? And if I understand correctly, if you really want to achieve the goal, the patch doesn't seem to be sufficient; you may want to go through KafkaDataConsumer and apply transformation in `buffer` which would store transformed data into memory and serve them, but I wouldn't think it is easy to do safely as KafkaDataConsumer is cached and served for same topicpartition. (If you have to apply different messageHandlers for the same topicpartition, it'll be messed up.)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
