Re: Understanding metrics
Hi Xinyu, Super weird, but I can’t seem to find the commit-ns nor the window-ns metric for our jobs. There are window-calls and commit-calls but the timings are missing. task.commit.ms=9 task.window.ms=9 Ankit On 3/14/17, 8:00 PM, "xinyu liu" wrote: Hi, Ankit, When running your job in multithreading, block-ns here actually includes the process_ns. This is because after your task.process() is submitted to the thread pool, the run loop thread will be blocked until the process() is complete for one of the task. It's interesting that block-ns (0.3 ms) is much longer than process-ns (0.12 ms). I am wondering whether you also have window and checkpoint configured for your job. Since window and checkpoint will also be running inside this thread pool to improve the parallelism, block-ns will be affected since the run loop will block for window/checkpoint to complete. If you are using window/commit, please send us the configs (task.window.ms and task.commit.ms) and the timer metrics (window-ns and commit-ns). Then we can correlate better with block-ns. Thanks, Xinyu On Tue, Mar 14, 2017 at 3:33 PM, Ankit Malhotra wrote: > Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in > choose-ns? > > Thanks > Ankit > > On 3/14/17, 6:26 PM, "Jagadish Venkatraman" > wrote: > > I would expect (process_ns + block_ns) to be almost the same as 0.15 > which > makes sense. > > process_ns = 0.12 ms > block_ns = 0.03 ms > process_ns + block_ns ~ 0.15ms > > This corresponds to the number of process calls roughly 1/7000 ~ > 0.15ms per > process call. > > >> Each process call is a separate thread. > Given that you have one partition in each container, and you want > in-order > processing, there will be only one thread that's processing messages. > The > two other threads are not doing work, and entail a constant (albeit > insignificant) synchronization overhead. > > > > > > On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra < > amalho...@appnexus.com> > wrote: > > > Hi, > > > > We are trying to understand metrics that are being populated by our > samza > > job and are a little confused what each of these metrics mean > especially > > since we’re running the job with a thread pool. > > > > > > · We have 3 input streams > > > > · job.container.thread.pool.size=3 > > > > · 1 container per partition > > > > · Using a RocksDB backed store with changelogging > > > > · process-ns = 120,000 > > > > · get-ns ~ 30,000 > > > > · put-ns ~ 90,000 > > > > · block-ns ~ 300,000 > > > > · choose-ns ~ 500,000 > > > > Metrics are avg(metric) for all containers/partitions. > > > > Process-envelopes ~ 7000/sec. > > > > If I back the math out, this correlates quite closely to process-ns. > > (1/7000 ~ 0.15ms). > > > > What I don’t understand is that the event loop is single threaded. > Even > > though, each process call is a separate thread, the main even loop is > > blocking (block-ns) and choosing (choose-ns) every time and these > times are > > quite high. Given these metrics, how is it that we are consistently > seeing > > the above metrics? > > > > Also, lag (messages behind high watermark) is ~ 0. > > > > Thanks > > Ankit > > > > > > > > > > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University > > >
Re: Understanding metrics
Hi, Ankit, When running your job in multithreading, block-ns here actually includes the process_ns. This is because after your task.process() is submitted to the thread pool, the run loop thread will be blocked until the process() is complete for one of the task. It's interesting that block-ns (0.3 ms) is much longer than process-ns (0.12 ms). I am wondering whether you also have window and checkpoint configured for your job. Since window and checkpoint will also be running inside this thread pool to improve the parallelism, block-ns will be affected since the run loop will block for window/checkpoint to complete. If you are using window/commit, please send us the configs (task.window.ms and task.commit.ms) and the timer metrics (window-ns and commit-ns). Then we can correlate better with block-ns. Thanks, Xinyu On Tue, Mar 14, 2017 at 3:33 PM, Ankit Malhotra wrote: > Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in > choose-ns? > > Thanks > Ankit > > On 3/14/17, 6:26 PM, "Jagadish Venkatraman" > wrote: > > I would expect (process_ns + block_ns) to be almost the same as 0.15 > which > makes sense. > > process_ns = 0.12 ms > block_ns = 0.03 ms > process_ns + block_ns ~ 0.15ms > > This corresponds to the number of process calls roughly 1/7000 ~ > 0.15ms per > process call. > > >> Each process call is a separate thread. > Given that you have one partition in each container, and you want > in-order > processing, there will be only one thread that's processing messages. > The > two other threads are not doing work, and entail a constant (albeit > insignificant) synchronization overhead. > > > > > > On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra < > amalho...@appnexus.com> > wrote: > > > Hi, > > > > We are trying to understand metrics that are being populated by our > samza > > job and are a little confused what each of these metrics mean > especially > > since we’re running the job with a thread pool. > > > > > > · We have 3 input streams > > > > · job.container.thread.pool.size=3 > > > > · 1 container per partition > > > > · Using a RocksDB backed store with changelogging > > > > · process-ns = 120,000 > > > > · get-ns ~ 30,000 > > > > · put-ns ~ 90,000 > > > > · block-ns ~ 300,000 > > > > · choose-ns ~ 500,000 > > > > Metrics are avg(metric) for all containers/partitions. > > > > Process-envelopes ~ 7000/sec. > > > > If I back the math out, this correlates quite closely to process-ns. > > (1/7000 ~ 0.15ms). > > > > What I don’t understand is that the event loop is single threaded. > Even > > though, each process call is a separate thread, the main even loop is > > blocking (block-ns) and choosing (choose-ns) every time and these > times are > > quite high. Given these metrics, how is it that we are consistently > seeing > > the above metrics? > > > > Also, lag (messages behind high watermark) is ~ 0. > > > > Thanks > > Ankit > > > > > > > > > > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University > > >
Re: Understanding metrics
Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in choose-ns? Thanks Ankit On 3/14/17, 6:26 PM, "Jagadish Venkatraman" wrote: I would expect (process_ns + block_ns) to be almost the same as 0.15 which makes sense. process_ns = 0.12 ms block_ns = 0.03 ms process_ns + block_ns ~ 0.15ms This corresponds to the number of process calls roughly 1/7000 ~ 0.15ms per process call. >> Each process call is a separate thread. Given that you have one partition in each container, and you want in-order processing, there will be only one thread that's processing messages. The two other threads are not doing work, and entail a constant (albeit insignificant) synchronization overhead. On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra wrote: > Hi, > > We are trying to understand metrics that are being populated by our samza > job and are a little confused what each of these metrics mean especially > since we’re running the job with a thread pool. > > > · We have 3 input streams > > · job.container.thread.pool.size=3 > > · 1 container per partition > > · Using a RocksDB backed store with changelogging > > · process-ns = 120,000 > > · get-ns ~ 30,000 > > · put-ns ~ 90,000 > > · block-ns ~ 300,000 > > · choose-ns ~ 500,000 > > Metrics are avg(metric) for all containers/partitions. > > Process-envelopes ~ 7000/sec. > > If I back the math out, this correlates quite closely to process-ns. > (1/7000 ~ 0.15ms). > > What I don’t understand is that the event loop is single threaded. Even > though, each process call is a separate thread, the main even loop is > blocking (block-ns) and choosing (choose-ns) every time and these times are > quite high. Given these metrics, how is it that we are consistently seeing > the above metrics? > > Also, lag (messages behind high watermark) is ~ 0. > > Thanks > Ankit > > > > > > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University
Re: Understanding metrics
I would expect (process_ns + block_ns) to be almost the same as 0.15 which makes sense. process_ns = 0.12 ms block_ns = 0.03 ms process_ns + block_ns ~ 0.15ms This corresponds to the number of process calls roughly 1/7000 ~ 0.15ms per process call. >> Each process call is a separate thread. Given that you have one partition in each container, and you want in-order processing, there will be only one thread that's processing messages. The two other threads are not doing work, and entail a constant (albeit insignificant) synchronization overhead. On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra wrote: > Hi, > > We are trying to understand metrics that are being populated by our samza > job and are a little confused what each of these metrics mean especially > since we’re running the job with a thread pool. > > > · We have 3 input streams > > · job.container.thread.pool.size=3 > > · 1 container per partition > > · Using a RocksDB backed store with changelogging > > · process-ns = 120,000 > > · get-ns ~ 30,000 > > · put-ns ~ 90,000 > > · block-ns ~ 300,000 > > · choose-ns ~ 500,000 > > Metrics are avg(metric) for all containers/partitions. > > Process-envelopes ~ 7000/sec. > > If I back the math out, this correlates quite closely to process-ns. > (1/7000 ~ 0.15ms). > > What I don’t understand is that the event loop is single threaded. Even > though, each process call is a separate thread, the main even loop is > blocking (block-ns) and choosing (choose-ns) every time and these times are > quite high. Given these metrics, how is it that we are consistently seeing > the above metrics? > > Also, lag (messages behind high watermark) is ~ 0. > > Thanks > Ankit > > > > > > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University
Understanding metrics
Hi, We are trying to understand metrics that are being populated by our samza job and are a little confused what each of these metrics mean especially since we’re running the job with a thread pool. · We have 3 input streams · job.container.thread.pool.size=3 · 1 container per partition · Using a RocksDB backed store with changelogging · process-ns = 120,000 · get-ns ~ 30,000 · put-ns ~ 90,000 · block-ns ~ 300,000 · choose-ns ~ 500,000 Metrics are avg(metric) for all containers/partitions. Process-envelopes ~ 7000/sec. If I back the math out, this correlates quite closely to process-ns. (1/7000 ~ 0.15ms). What I don’t understand is that the event loop is single threaded. Even though, each process call is a separate thread, the main even loop is blocking (block-ns) and choosing (choose-ns) every time and these times are quite high. Given these metrics, how is it that we are consistently seeing the above metrics? Also, lag (messages behind high watermark) is ~ 0. Thanks Ankit