[GitHub] [spark] HeartSaVioR edited a comment on issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

GitBox Wed, 13 Mar 2019 08:49:06 -0700

HeartSaVioR edited a comment on issue #22138: [SPARK-25151][SS] Apply Apache 
Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#issuecomment-472479384
 
 
   UPDATE: just added log message to log when Kafka consumer is created.
   
   * master: 
https://github.com/HeartSaVioR/spark/tree/SPARK-25151-master-ref-debugging
   * patch: https://github.com/HeartSaVioR/spark/tree/SPARK-25151-debugging
   
   I've also corrected my spark-shell execution to use `local[*]` instead of 
`local[1]` which prevented concurrent access in previous experiment. FYI my 
laptop has 4 cores (8 logical cores).
   
   ```
   ./bin/spark-shell --master "local[*]" \
   --packages org.apache.spark:spark-sql-kafka-0-10_2.12:<version> \
   --driver-memory 6G > >(tee -a stdout-master-experiment.log) 2> >(tee -a 
stderr-master-experiment.log >&2)
   ```
   
   I've collected the count of fetch requests on Kafka via below command:
   
   ```
   grep "creating new Kafka consumer" logfile | wc -l
   ```
   
   and the count of creating Kafka consumers via below command:
   
   ```
   grep "fetching data from Kafka consumer" logfile | wc -l
   ```
   
   Same query with same data: 497 batches were run.
   
   branch | create Kafka consumer | fetch request
   ------- | ----------------------- | --------------
   master | 1986 | 2837
   patch | 8 | 1706
   
   The result of experiment looks to prove that the patch properly caches and 
serves the consumers for concurrent usage (4 partitions * 2 concurrent streams 
= 8), as well as properly caches fetch data as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

Reply via email to