Re: BK metrics
> The disks were under 60% utilization (not saturated). >> 60% of bandwidth or iops? only one of the two needs to be saturated. And which disk, journal or ledgers? It is the disk busy percentage. Both journal and ledger disks were around 60% (journal was more consistent). > Are there any benchmark results of BookKeeper that can be shared? >> I don't have any to hand, but maybe someone else on the list does. What key bookkeeper metrics that you would suggest to monitor? It will be nice if there is any documentation around the metrics that talks at a high level about what the metrics is about (in terms of how to understand/interpret) and the expectations around it (numbers on best/worst case scenarios). RegardsVijayOn Tuesday, March 20, 2018, 11:34:39 PM PDT, Ivan Kellywrote: > @Ivan, for some reasons I did not receive your reply but found it in the > email archives. Are you subscribed to the list? I did see one mail from you show up in moderation. > At 80K request/sec throttling for record size of 1K, I am getting below > throughput. The 99th percentile of `bookkeeper_server_ADD_ENTRY_REQUEST` and > `bookkeeper_server_ADD_ENTRY` are around 350 ms. I am starting to see the lag > when I increase the ingestion rate limit beyond 90 K/sec limit. So this suggests to me that the metrics are reporting correctly. > The disks were under 60% utilization (not saturated). 60% of bandwidth or iops? only one of the two needs to be saturated. And which disk, journal or ledgers? > Are there any benchmark results of BookKeeper that can be shared? I don't have any to hand, but maybe someone else on the list does. Regards, Ivan
Re: BK metrics
> @Ivan, for some reasons I did not receive your reply but found it in the > email archives. Are you subscribed to the list? I did see one mail from you show up in moderation. > At 80K request/sec throttling for record size of 1K, I am getting below > throughput. The 99th percentile of `bookkeeper_server_ADD_ENTRY_REQUEST` and > `bookkeeper_server_ADD_ENTRY` are around 350 ms. I am starting to see the lag > when I increase the ingestion rate limit beyond 90 K/sec limit. So this suggests to me that the metrics are reporting correctly. > The disks were under 60% utilization (not saturated). 60% of bandwidth or iops? only one of the two needs to be saturated. And which disk, journal or ledgers? > Are there any benchmark results of BookKeeper that can be shared? I don't have any to hand, but maybe someone else on the list does. Regards, Ivan
BK metrics
```> 2) If it's in milliseconds, are these numbers in expected range (see > attached image). To me 2.5 seconds (2.5K ms) latency for add entry request > is very high. 2.5 seconds is very high, but your write rate is also high. 100,000 * 1KB is 100MB/s. SSD should be able to take it from the journal side, but it depends on the hardware. Have you tried reducing the write rate to see how the latency changes? What is the client seeing for latency? I assume the client and all servers have 10GigE nics? Your images didn't attach correctly. Maybe they're too big to post directly to the list. There is a size limit, but I don't know what it is. -Ivan``` @Ivan, for some reasons I did not receive your reply but found it in the email archives. I have copied your response in this email for the context. At 80K request/sec throttling for record size of 1K, I am getting below throughput. The 99th percentile of `bookkeeper_server_ADD_ENTRY_REQUEST` and `bookkeeper_server_ADD_ENTRY` are around 350 ms. I am starting to see the lag when I increase the ingestion rate limit beyond 90 K/sec limit. The disks were under 60% utilization (not saturated). All clients and server machines have 10G nics. Throughput (records/sec): 79505, Throughput (bytes): 75.8 MB/s, Latency (ms): average-118, 50th-106, 75th-139, 90th-193, 99th-395, 999th-658 Are there any benchmark results of BookKeeper that can be shared? RegardsVijay
BK metrics
Hello, I am running a load test scenario where we have 3 Bookies, dedicated SSD's for journal and ledger, JVM heap size 5G with G1GC enabled. `jvm_opts: -Xmx5g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+PrintFlagsFinal -XX:+PrintGC -XX:+PrintGCCause -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps` I am testing 1000 bytes record size, ingesting around 50M record at the ingestion rate limit 100K/sec. I wanted to understand how the metric types `stats` are reported. bookkeeper_server_ADD_ENTRY_REQUEST bookkeeper_server_ADD_ENTRYJOURNAL_ADD_ENTRYJOURNAL_SYNCJOURNAL_QUEUE_LATENCYJOURNAL_FLUSH_LATENCYJOURNAL_PROCESS_TIME_LATENCY My understanding is that the above metrics are reported in micro seconds (from BK code) and the reporters (we use statsD to collect BK metrics `codahale` and sink it to `InfluxDB`) converts the `rates` to seconds and `duration` to `milliseconds` 1) I wanted to confirm if the final graph values that I am seeing in the UI (attached) is represented in milliseconds or some other units? 2) If it's in milliseconds, are these numbers in expected range (see attached image). To me 2.5 seconds (2.5K ms) latency for add entry request is very high. Any help to understand the metrics is much appreciated. RegardsVijay
Re: BK metrics
> 2) If it's in milliseconds, are these numbers in expected range (see > attached image). To me 2.5 seconds (2.5K ms) latency for add entry request > is very high. 2.5 seconds is very high, but your write rate is also high. 100,000 * 1KB is 100MB/s. SSD should be able to take it from the journal side, but it depends on the hardware. Have you tried reducing the write rate to see how the latency changes? What is the client seeing for latency? I assume the client and all servers have 10GigE nics? Your images didn't attach correctly. Maybe they're too big to post directly to the list. There is a size limit, but I don't know what it is. -Ivan
BK metrics
Hello, I am running a load test scenario where we have 3 Bookies, dedicated SSD's for journal and ledger, JVM heap size 5G with G1GC enabled. `jvm_opts: -Xmx5g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+PrintFlagsFinal -XX:+PrintGC -XX:+PrintGCCause -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps` I am testing 1000 bytes record size, ingesting around 50M record at the ingestion rate limit 100K/sec. I wanted to understand how the metric types `stats` are reported. bookkeeper_server_ADD_ENTRY_REQUEST bookkeeper_server_ADD_ENTRYJOURNAL_ADD_ENTRYJOURNAL_SYNCJOURNAL_QUEUE_LATENCYJOURNAL_FLUSH_LATENCYJOURNAL_PROCESS_TIME_LATENCY My understanding is that the above metrics are reported in micro seconds (from BK code) and the reporters (we use statsD to collect BK metrics `codahale` and sink it to `InfluxDB`) converts the `rates` to seconds and `duration` to `milliseconds` 1) I wanted to confirm if the final graph values that I am seeing in the UI (attached) is represented in milliseconds or some other units? 2) If it's in milliseconds, are these numbers in expected range (see attached image). To me 2.5 seconds (2.5K ms) latency for add entry request is very high. Any help to understand the metrics is much appreciated. RegardsVijay