koeninger commented on issue #27022: [SPARK-28415][DSTREAMS] Add messageHandler to Kafka 10 direct stream API #25205 URL: https://github.com/apache/spark/pull/27022#issuecomment-570357155 Do you have a minimal reproducible case showing the difference in memory usage? My expectation would be that if the very first thing you were doing with the dstream was calling foreachRDD and then rdd.foreachPartition, that the memory usage would be comparable to what you are doing here. It's an iterator backed by a Kafka consumer that has to have the whole ConsumerRecord in memory either way. It's just a question of whether your message conversion is happening before or after next() returns from the iterator, right? Or am I missing something?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
