Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-24 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50174/#review146739 --- Not that I intend to give you more work. However, adding an

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-24 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50174/#review146554 --- docs/learn/documentation/versioned/api/overview.md (line 22)

Job coordinator stream and job redeployment

2016-08-24 Thread David Yu
Hi, I'm trying to understand role of the coordinator stream during a job redeployment. >From the Samza documentation, I'm seeing the following about the coordinator stream: The Job Coordinator bootstraps configuration from the coordinator stream each time upon job start-up. It periodically

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
Make sense. Thanks for the help, Jake! On Wed, Aug 24, 2016 at 5:11 PM Jacob Maes wrote: > We don't have any hard guidelines around that metric just because there are > no hard rules that work for every job. For example, some jobs are very > bursty and need to keep up with

Re: Debug Samza consumer lag issue

2016-08-24 Thread Jacob Maes
We don't have any hard guidelines around that metric just because there are no hard rules that work for every job. For example, some jobs are very bursty and need to keep up with huge traffic ramp-ups even though they're underutilized the rest of the time. That said, yes, I have used that metric

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
Interesting. To me, "event-loop-utilization" looks like a good indicator that shows us how busy the containers are. Is it safe to use this metric as a reference when we need to scale out/in our job? For example, if I'm seeing around 0.3 utilization most of the time, maybe I can decrease the # of

Re: Review Request 51346: SAMZA-974 - Support finite datasources in Samza that have a notion of End-Of-Stream

2016-08-24 Thread Xinyu Liu
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51346/#review146712 --- samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java

Re: Review Request 51346: SAMZA-974 - Support finite datasources in Samza that have a notion of End-Of-Stream

2016-08-24 Thread Jagadish Venkatraman
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51346/ --- (Updated Aug. 24, 2016, 9:03 p.m.) Review request for samza, Boris Shkolnik,

Re: Review Request 49212: RFC: SAMZA-855: Update kafka client to 0.10.0.0

2016-08-24 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49212/#review146701 ---

Re: [DISCUSS] Samza 0.11.0 release

2016-08-24 Thread Nicolas Maquet
Hi, We are looking at upgrading to Kafka 0.10.0 in part for the new message format which includes a timestamp field. Kafka 0.10.0 is backwards compatible with 0.8.x clients but we are concerned about the performance impact, see

Re: Debug Samza consumer lag issue

2016-08-24 Thread Jacob Maes
> > Based on what you have described, the following should be true in 0.10.1: > event-loop-ns = choose-ns + process-ns + window-ns (if necessary) + > commit-ns (if necessary) Yes, plus any time (e.g. due to an unlucky GC at just the right moment) that happens outside those timers. And no "if

Re: [DISCUSS] Samza 0.11.0 release

2016-08-24 Thread Yi Pan
Hi, Nicolas, Could you explain to me why Samza is blocking you from upgrading your Kafka brokers to 0.10? At LinkedIn, we are running Samza 0.10 w/ Kafka 0.10 brokers. This is a valid combination since Kafka 0.10 brokers should be backward compatible w/ 0.8.2 clients (which is the version Samza

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
Great. It all makes sense now. With the SSD fix, we also upgrade to 0.10.1. So we should see pretty consistent process-ns (which we do). Based on what you have described, the following should be true in 0.10.1: event-loop-ns = choose-ns + process-ns + window-ns (if necessary) + commit-ns (if

Re: Debug Samza consumer lag issue

2016-08-24 Thread Jacob Maes
A couple other notes. Prior to Samza 10.1, the choose-ns was part of process-ns. So when choose-ns and process-ns are both high (around 10,000,000 == 10ms, which is the default poll timeout), that usually means the task is caught up. In Samza 10.1 the same is true if ONLY choose-ns is high.

Re: Debug Samza consumer lag issue

2016-08-24 Thread Jacob Maes
Hey David, Answering the most recent question first, since it's also the easiest. :-) Is choose-ns the total number of ms used to choose a message from the input > stream? What are some gating factors (e.g. serialization?) for this metric? It's the amount of time the event loop spent getting

Re: Debug Samza consumer lag issue

2016-08-24 Thread David Yu
More updates: 1. process-envelopes rate finally stabilized and converged. Consumer lag is down to zero. 2. avg choose-ns across containers dropped overtime , which I assume is a good thing. My question: Is