HeartSaVioR edited a comment on issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer URL: https://github.com/apache/spark/pull/22138#issuecomment-472479384 UPDATE: just added log message to log when Kafka consumer is created. * master: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-master-ref-debugging * patch: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-debugging I've also corrected my spark-shell execution to use `local[*]` instead of `local[1]` which prevented concurrent access in previous experiment. FYI my laptop has 4 cores (8 logical cores). ``` ./bin/spark-shell --master "local[*]" \ --packages org.apache.spark:spark-sql-kafka-0-10_2.12:<version> \ --driver-memory 6G > >(tee -a stdout-master-experiment.log) 2> >(tee -a stderr-master-experiment.log >&2) ``` I've collected the count of fetch requests on Kafka via below command: ``` grep "creating new Kafka consumer" logfile | wc -l ``` and the count of creating Kafka consumers via below command: ``` grep "fetching data from Kafka consumer" logfile | wc -l ``` Same query with same data: 497 batches were run. branch | create Kafka consumer | fetch request ------- | ----------------------- | -------------- master | 1986 | 2837 patch | 8 | 1706 The result of experiment looks to prove that the patch properly caches and serves the consumers for concurrent usage (4 partitions * 2 concurrent streams = 8), as well as properly caches fetch data as well.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
