Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Xinyu,
I considered doing that as an example. But I want to keep SEP to be only
for technical discussions and not process related proposals.

Navina

On Mar 14, 2017 17:23, "xinyu liu"  wrote:

> +1 on this proposal too. Could you actually put this proposal as the first
> SEP (like SEP-0), so it serves an example of how it will look like in
> practice?
>
> Xinyu
>
> On Tue, Mar 14, 2017 at 3:34 PM, Navina Ramesh
>  > wrote:
>
> > Just to clarify: The proposal for code and design process change is
> > attached as a PDF/markdown to the JIRA - SAMZA-1141.
> >
> > Also, please show your support specifically for code and design process.
> My
> > bad for not calling it out earlier :)
> >
> > Thanks!
> > Navina
> >
> > On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Thanks for writing this up.
> > >
> > > I'm +1 on this proposal.
> > >
> > >
> > >
> > > On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We switched to using Pull Requests for code reviews a few months
> back.
> > > > Clearly, there are some drawbacks to that model and we are trying to
> > > > address the shortcomings. I have gathered input from some of the
> > > committers
> > > > regarding what is missing the code review process and what can be
> > > improved.
> > > > Please take a look and provide feedback.
> > > >
> > > > Additionally, we are considering moving to a KIP/FLIP-like model for
> > > > submitting design proposals (major changes to samza). Lately, there
> > have
> > > > been some major feature discussions that are not documented
> > consistently
> > > in
> > > > a centralized location. The proposal in SAMZA-1141
> > > >  address the
> design
> > > > review process as well. Please review it too. I have already created
> a
> > > wiki
> > > > page
> > > >  > > > Samza+Enhancement+Proposal>
> > > > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > > > template. Going forward, let's start adding all major change
> proposals
> > to
> > > > the wiki and discuss the design on the mailing list.
> > > >
> > > > Your cooperation is highly appreciated during this period of
> transition
> > > in
> > > > the process :)
> > > >
> > > > Feedbacks welcome!
> > > >
> > > > Thanks!
> > > > --
> > > > Navina R
> > > >
> > > > PS: Alternatives name suggestions for "SEP" are welcome !
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread xinyu liu
+1 on this proposal too. Could you actually put this proposal as the first
SEP (like SEP-0), so it serves an example of how it will look like in
practice?

Xinyu

On Tue, Mar 14, 2017 at 3:34 PM, Navina Ramesh  wrote:

> Just to clarify: The proposal for code and design process change is
> attached as a PDF/markdown to the JIRA - SAMZA-1141.
>
> Also, please show your support specifically for code and design process. My
> bad for not calling it out earlier :)
>
> Thanks!
> Navina
>
> On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> > Thanks for writing this up.
> >
> > I'm +1 on this proposal.
> >
> >
> >
> > On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <
> nav...@apache.org
> > >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > We switched to using Pull Requests for code reviews a few months back.
> > > Clearly, there are some drawbacks to that model and we are trying to
> > > address the shortcomings. I have gathered input from some of the
> > committers
> > > regarding what is missing the code review process and what can be
> > improved.
> > > Please take a look and provide feedback.
> > >
> > > Additionally, we are considering moving to a KIP/FLIP-like model for
> > > submitting design proposals (major changes to samza). Lately, there
> have
> > > been some major feature discussions that are not documented
> consistently
> > in
> > > a centralized location. The proposal in SAMZA-1141
> > >  address the design
> > > review process as well. Please review it too. I have already created a
> > wiki
> > > page
> > >  > > Samza+Enhancement+Proposal>
> > > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > > template. Going forward, let's start adding all major change proposals
> to
> > > the wiki and discuss the design on the mailing list.
> > >
> > > Your cooperation is highly appreciated during this period of transition
> > in
> > > the process :)
> > >
> > > Feedbacks welcome!
> > >
> > > Thanks!
> > > --
> > > Navina R
> > >
> > > PS: Alternatives name suggestions for "SEP" are welcome !
> > >
> >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>
>
>
> --
> Navina R.
>


Re: Understanding metrics

2017-03-14 Thread xinyu liu
Hi, Ankit,

When running your job in multithreading, block-ns here actually includes
the process_ns. This is because after your task.process() is submitted to
the thread pool, the run loop thread will be blocked until the process() is
complete for one of the task. It's interesting that block-ns (0.3 ms) is
much longer than process-ns (0.12 ms). I am wondering whether you also have
window and checkpoint configured for your job. Since window and checkpoint
will also be running inside this thread pool to improve the parallelism,
block-ns will be affected since the run loop will block for
window/checkpoint to complete. If you are using window/commit, please send
us the configs (task.window.ms and task.commit.ms) and the timer metrics
(window-ns and commit-ns). Then we can correlate better with block-ns.

Thanks,
Xinyu

On Tue, Mar 14, 2017 at 3:33 PM, Ankit Malhotra 
wrote:

> Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in
> choose-ns?
>
> Thanks
> Ankit
>
> On 3/14/17, 6:26 PM, "Jagadish Venkatraman" 
> wrote:
>
> I would expect (process_ns + block_ns) to be almost the same as 0.15
> which
> makes sense.
>
> process_ns = 0.12 ms
> block_ns = 0.03 ms
> process_ns + block_ns ~ 0.15ms
>
> This corresponds to the number of process calls roughly 1/7000 ~
> 0.15ms per
> process call.
>
> >> Each process call is a separate thread.
> Given that you have one partition in each container, and you want
> in-order
> processing, there will be only one thread that's processing messages.
> The
> two other threads are not doing work, and entail a constant (albeit
> insignificant) synchronization overhead.
>
>
>
>
>
> On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra <
> amalho...@appnexus.com>
> wrote:
>
> > Hi,
> >
> > We are trying to understand metrics that are being populated by our
> samza
> > job and are a little confused what each of these metrics mean
> especially
> > since we’re running the job with a thread pool.
> >
> >
> > · We have 3 input streams
> >
> > · job.container.thread.pool.size=3
> >
> > · 1 container per partition
> >
> > · Using a RocksDB backed store with changelogging
> >
> > · process-ns = 120,000
> >
> > · get-ns ~ 30,000
> >
> > · put-ns ~ 90,000
> >
> > · block-ns ~ 300,000
> >
> > · choose-ns ~ 500,000
> >
> > Metrics are avg(metric) for all containers/partitions.
> >
> > Process-envelopes ~ 7000/sec.
> >
> > If I back the math out, this correlates quite closely to process-ns.
> > (1/7000 ~ 0.15ms).
> >
> > What I don’t understand is that the event loop is single threaded.
> Even
> > though, each process call is a separate thread, the main even loop is
> > blocking (block-ns) and choosing (choose-ns) every time and these
> times are
> > quite high. Given these metrics, how is it that we are consistently
> seeing
> > the above metrics?
> >
> > Also, lag (messages behind high watermark) is ~ 0.
> >
> > Thanks
> > Ankit
> >
> >
> >
> >
> >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>
>
>


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Just to clarify: The proposal for code and design process change is
attached as a PDF/markdown to the JIRA - SAMZA-1141.

Also, please show your support specifically for code and design process. My
bad for not calling it out earlier :)

Thanks!
Navina

On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this up.
>
> I'm +1 on this proposal.
>
>
>
> On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache)  >
> wrote:
>
> > Hi everyone,
> >
> > We switched to using Pull Requests for code reviews a few months back.
> > Clearly, there are some drawbacks to that model and we are trying to
> > address the shortcomings. I have gathered input from some of the
> committers
> > regarding what is missing the code review process and what can be
> improved.
> > Please take a look and provide feedback.
> >
> > Additionally, we are considering moving to a KIP/FLIP-like model for
> > submitting design proposals (major changes to samza). Lately, there have
> > been some major feature discussions that are not documented consistently
> in
> > a centralized location. The proposal in SAMZA-1141
> >  address the design
> > review process as well. Please review it too. I have already created a
> wiki
> > page
> >  > Samza+Enhancement+Proposal>
> > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > template. Going forward, let's start adding all major change proposals to
> > the wiki and discuss the design on the mailing list.
> >
> > Your cooperation is highly appreciated during this period of transition
> in
> > the process :)
> >
> > Feedbacks welcome!
> >
> > Thanks!
> > --
> > Navina R
> >
> > PS: Alternatives name suggestions for "SEP" are welcome !
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: Understanding metrics

2017-03-14 Thread Ankit Malhotra
Wait, block-ns = 0.3ms (300,000ns). Also, why are we not adding in choose-ns?

Thanks
Ankit

On 3/14/17, 6:26 PM, "Jagadish Venkatraman"  wrote:

I would expect (process_ns + block_ns) to be almost the same as 0.15 which
makes sense.

process_ns = 0.12 ms
block_ns = 0.03 ms
process_ns + block_ns ~ 0.15ms

This corresponds to the number of process calls roughly 1/7000 ~ 0.15ms per
process call.

>> Each process call is a separate thread.
Given that you have one partition in each container, and you want in-order
processing, there will be only one thread that's processing messages. The
two other threads are not doing work, and entail a constant (albeit
insignificant) synchronization overhead.





On Tue, Mar 14, 2017 at 3:03 PM, Ankit Malhotra 
wrote:

> Hi,
>
> We are trying to understand metrics that are being populated by our samza
> job and are a little confused what each of these metrics mean especially
> since we’re running the job with a thread pool.
>
>
> · We have 3 input streams
>
> · job.container.thread.pool.size=3
>
> · 1 container per partition
>
> · Using a RocksDB backed store with changelogging
>
> · process-ns = 120,000
>
> · get-ns ~ 30,000
>
> · put-ns ~ 90,000
>
> · block-ns ~ 300,000
>
> · choose-ns ~ 500,000
>
> Metrics are avg(metric) for all containers/partitions.
>
> Process-envelopes ~ 7000/sec.
>
> If I back the math out, this correlates quite closely to process-ns.
> (1/7000 ~ 0.15ms).
>
> What I don’t understand is that the event loop is single threaded. Even
> though, each process call is a separate thread, the main even loop is
> blocking (block-ns) and choosing (choose-ns) every time and these times 
are
> quite high. Given these metrics, how is it that we are consistently seeing
> the above metrics?
>
> Also, lag (messages behind high watermark) is ~ 0.
>
> Thanks
> Ankit
>
>
>
>
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University




Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Jagadish Venkatraman
Thanks for writing this up.

I'm +1 on this proposal.



On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) 
wrote:

> Hi everyone,
>
> We switched to using Pull Requests for code reviews a few months back.
> Clearly, there are some drawbacks to that model and we are trying to
> address the shortcomings. I have gathered input from some of the committers
> regarding what is missing the code review process and what can be improved.
> Please take a look and provide feedback.
>
> Additionally, we are considering moving to a KIP/FLIP-like model for
> submitting design proposals (major changes to samza). Lately, there have
> been some major feature discussions that are not documented consistently in
> a centralized location. The proposal in SAMZA-1141
>  address the design
> review process as well. Please review it too. I have already created a wiki
> page
>  Samza+Enhancement+Proposal>
> describing the Samza Enhancement Proposal (SEP) process and an SEP
> template. Going forward, let's start adding all major change proposals to
> the wiki and discuss the design on the mailing list.
>
> Your cooperation is highly appreciated during this period of transition in
> the process :)
>
> Feedbacks welcome!
>
> Thanks!
> --
> Navina R
>
> PS: Alternatives name suggestions for "SEP" are welcome !
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University


Understanding metrics

2017-03-14 Thread Ankit Malhotra
Hi,

We are trying to understand metrics that are being populated by our samza job 
and are a little confused what each of these metrics mean especially since 
we’re running the job with a thread pool.


· We have 3 input streams

· job.container.thread.pool.size=3

· 1 container per partition

· Using a RocksDB backed store with changelogging

· process-ns = 120,000

· get-ns ~ 30,000

· put-ns ~ 90,000

· block-ns ~ 300,000

· choose-ns ~ 500,000

Metrics are avg(metric) for all containers/partitions.

Process-envelopes ~ 7000/sec.

If I back the math out, this correlates quite closely to process-ns. (1/7000 ~ 
0.15ms).

What I don’t understand is that the event loop is single threaded. Even though, 
each process call is a separate thread, the main even loop is blocking 
(block-ns) and choosing (choose-ns) every time and these times are quite high. 
Given these metrics, how is it that we are consistently seeing the above 
metrics?

Also, lag (messages behind high watermark) is ~ 0.

Thanks
Ankit







[GitHub] samza pull request #60: SAMZA-1091: Implement key-based inner join operator ...

2017-03-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/60


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #85: SAMZA-1140 : Non blocking commit in Async Runloop

2017-03-14 Thread shanthoosh
GitHub user shanthoosh opened a pull request:

https://github.com/apache/samza/pull/85

SAMZA-1140 : Non blocking commit in Async Runloop



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shanthoosh/samza asyncCommitSupport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/85.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #85


commit 7f304204d8b51f31d29a5518e5c67b4fb7397a7b
Author: Shanthoosh Venkataraman 
Date:   2017-03-08T01:48:58Z

Adding non blocking commit in AsyncRunLoop.

commit 8ce16d0971151ce9d44433d3572e48546c8791da
Author: Shanthoosh Venkataraman 
Date:   2017-03-14T20:00:42Z

Temp commit.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---