Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-22 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 22, 2016, 7:11 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Resolve conflicts


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 
67e259c9e5bbe100a29d4c3f12d2db7398344f34 
  docs/learn/documentation/versioned/container/event-loop.md 
9dcf92ca4f694714fb76157e7a1048bee348f289 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
14945e2d65e753c89007703a38af1b4692f8e923 
  docs/learn/tutorials/versioned/index.md 
ca2b08fb02a83f72f804c4059f258253c046a1b6 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-19 Thread Xinyu Liu


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 71
> > 
> >
> > The explanation of "processAsync() will always be invoked in a single 
> > thread" is off-the-target. I think that you might simply want to focus on 
> > "by default, AsyncStreamTask in Samza guarantees in-order process of 
> > messages in a task, by making sure the processAsync() can only be called 
> > after the callback of previous processAsync() is triggered". Whether 
> > processAsync() is invoked in a single thread or not does not directly 
> > explain the default in-order guarantee.
> > 
> > P.S. with this in-order guarantee, doesn't it defeat the whole purpose 
> > of async processing? Naturally, there is a question on what performance 
> > benefit we gain w/ the default processAsync()?

the perf benefit is that we have paralism among tasks. The processAsync() can 
be invoked independently among tasks.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 73
> > 
> >
> > I felt that the flow works better for me if we organize the explanation 
> > in the following order:
> > - Sync process w/ multi-thread: more parallelism for remote I/O among 
> > tasks, in-order execution within a task
> > - AsyncStreamTask w/ in-order execution in a task (alternative to sync 
> > task): default AsyncStreamTask, more parallelism for remote I/O among 
> > tasks, in-order execution within a task
> > - AsyncStreamTask: more parallelism among and within the tasks, 
> > out-of-order execution

Thanks for the suggestion! I like this better too and I reordered the doc. The 
third bullet (paralism within a task) can be applied to both Sync and Async 
cases. Now it becomes:

- Sync with multi-threading, in-order
- Async, in-order
- Out-of-order for both.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 119
> > 
> >
> > "in the order of the message arrivals".
> > 
> > Also, what do you refer to as "strict ordering of the output" here? In 
> > StreamTask w/ multi-threading, don't we guarantee the in-order processing 
> > within the task? We can't guarantee the ordering between the tasks anyways.

task.max.concurrency applies to both async and sync case, so we can have 
multiple process() run in parrallel for a StreamTask.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 125
> > 
> >
> > My original question on the doc w/ process() call is: I think that we 
> > guarantee that process()/processAsync() and window(), commit() within a 
> > single task instance are mutally exclusive to each other. window() and 
> > commit() are also exclusive to themselves. But multiple processAsync() 
> > calls can be invoked in different threads simultaneously, right? Hence, in 
> > a single task instance, the variables used in processAsync() calls need to 
> > be thread-safe. Am I interpretting the current implementation right?

Actually it's different: processAsync() will always be invoked from a single 
thread (for all tasks). The reason being asynchronous calls will not require 
threads blocking on the calls. The callback can be invoked from multiple 
threads. Samza controls when to invoke processAsync()/window()/commit() for 
each task.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review149262
---


On Sept. 19, 2016, 5:23 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 19, 2016, 5:23 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-19 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 19, 2016, 5:23 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Updates on Yi's feedback.


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md 
b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-16 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review149262
---



Thanks for the detailed write-up! Really helpful document. Please check the 
comments below and we can discuss in person if you have further questions.


docs/learn/tutorials/versioned/samza-async-user-guide.md (line 71)


The explanation of "processAsync() will always be invoked in a single 
thread" is off-the-target. I think that you might simply want to focus on "by 
default, AsyncStreamTask in Samza guarantees in-order process of messages in a 
task, by making sure the processAsync() can only be called after the callback 
of previous processAsync() is triggered". Whether processAsync() is invoked in 
a single thread or not does not directly explain the default in-order guarantee.

P.S. with this in-order guarantee, doesn't it defeat the whole purpose of 
async processing? Naturally, there is a question on what performance benefit we 
gain w/ the default processAsync()?



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 73)


I felt that the flow works better for me if we organize the explanation in 
the following order:
- Sync process w/ multi-thread: more parallelism for remote I/O among 
tasks, in-order execution within a task
- AsyncStreamTask w/ in-order execution in a task (alternative to sync 
task): default AsyncStreamTask, more parallelism for remote I/O among tasks, 
in-order execution within a task
- AsyncStreamTask: more parallelism among and within the tasks, 
out-of-order execution



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 119)


"in the order of the message arrivals".

Also, what do you refer to as "strict ordering of the output" here? In 
StreamTask w/ multi-threading, don't we guarantee the in-order processing 
within the task? We can't guarantee the ordering between the tasks anyways.



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 125)


My original question on the doc w/ process() call is: I think that we 
guarantee that process()/processAsync() and window(), commit() within a single 
task instance are mutally exclusive to each other. window() and commit() are 
also exclusive to themselves. But multiple processAsync() calls can be invoked 
in different threads simultaneously, right? Hence, in a single task instance, 
the variables used in processAsync() calls need to be thread-safe. Am I 
interpretting the current implementation right?



docs/learn/tutorials/versioned/samza-async-user-guide.md (line 127)


Isn't this just talking about commit() is mutally exclusive to 
process/processAsync and window? We can simply state:

- Checkpointing is guaranteed to only cover events that are fully 
processed. It is persisted in commit() method.


- Yi Pan (Data Infrastructure)


On Sept. 16, 2016, 5:37 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 16, 2016, 5:37 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md 
> b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-16 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 16, 2016, 5:37 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md 
b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-16 Thread Xinyu Liu


> On Sept. 14, 2016, 12:37 a.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > 
> >
> > What about thread-safety among multiple process() operations? I thought 
> > that multiple process() can be in multiple threads as well?

Added the tutorial which answers this question there.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148824
---


On Sept. 16, 2016, 5:33 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 16, 2016, 5:33 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md 
> b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-16 Thread Xinyu Liu


> On Sept. 14, 2016, 12:43 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 44
> > 
> >
> > We need a more concrete example of how to use callback functionality 
> > here. It is not very clear. Adding a tutorials or code-example to 
> > hello-samza will make it easier for you to complete this documentation :)

Thanks for the feedback. i added the tutorial :).


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148826
---


On Sept. 16, 2016, 5:33 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 16, 2016, 5:33 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md 
> b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-16 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 16, 2016, 5:33 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Added tutorial according to Navina's feedback.


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 
  docs/learn/tutorials/versioned/index.md 
b4d687a63638aca4f876af88556de9973acfd718 
  docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148824
---


Fix it, then Ship it!




lgtm overall. Thanks!


docs/learn/documentation/versioned/container/event-loop.md (line 28)


What about thread-safety among multiple process() operations? I thought 
that multiple process() can be in multiple threads as well?


- Yi Pan (Data Infrastructure)


On Sept. 13, 2016, 9 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 13, 2016, 9 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-13 Thread Xinyu Liu


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 49
> > 
> >
> > This is a bit confusing. If I read these two comment lines as pseudo 
> > code, I would think that the same thread is going to call asynchronous 
> > calls and fire callback. Is it the case? I thought that the callback is 
> > triggered in a callback thread, not in the main thread invoking the 
> > asynchronous calls?

I agree. I modified the sentence to // fire callback upon completion, e.g. 
invoking callback from asynchronous call completion thread. Does it looks clear 
to you?


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > 
> >
> > This section is a bit confusing to me. The description above in 
> > "Without..." sounds like that we are implementing read-after-write 
> > consistency on global variables. Here we are saying that global variables 
> > have to implement the sync method by the user.

Good point. I read it again and it's pretty confusing. I think the latter part 
is for states within a task, like member variables. I updated the doc based on 
that.


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 36
> > 
> >
> > So, this describes the AsyncRunLoop logic. Don't we still have the 
> > synchronous RunLoop in SamzaContainer?

Since in default the RunLoop is not used anymore, so I remove its description. 
Since we are planning to deprecate the runLoop soon, I think we only need to 
describe the new run loop, right?


> On Sept. 7, 2016, 6:01 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 50
> > 
> >
> > from *a* different user thread.

Good catch! Thanks!


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148041
---


On Sept. 13, 2016, 9 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 13, 2016, 9 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-13 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 13, 2016, 9 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Update based on Yi's feedback.


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-08 Thread Navina Ramesh


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 357
> > 
> >
> > Is it too late to comment on the config key pattern? Traditionally, we 
> > don't have any strict conventions for config names listed anywhere. Since 
> > we use period as a delimiter to distinguish configs belonging to a 
> > namespace, I think we should try to avoid using the same in the actual 
> > config part. 
> > For example, job.container.single.thread.mode can be 
> > job.container.single-thread-mode.
> > Same for job.container.thread.pool.size, task.callback.timeout.ms and 
> > task.max.concurrency
> 
> Xinyu Liu wrote:
> This config is pretty awkward (so is the name). It's just for fallback to 
> old runloop if there is any issue with the new AsyncRunLoop. So I don't 
> expect users to set it normally. Once the AsyncRunLoop stablized, I will 
> remove this config as well as the old runLoop.

As I mentioned above, I think it is important to clarify in the documentation 
that this is a fallback option to provide compatibility with versions < 0.11.0. 
Is it too late to change the config name ? :)


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
---


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-08 Thread Navina Ramesh


> On Aug. 30, 2016, 1 a.m., Xinyu Liu wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 368
> > 
> >
> > oh, actually the job.container.single.thread.mode  means using the old 
> > runloop which supports single thread execution only. It is default as false 
> > so we will use AsyncRunLoop. I plan to get rid of RunLoop in 11.1 once 
> > AsyncRunLoop is fully stablized.

If you are planning to remove support for some of the existing features or 
config (like runloop and single.thread.mode), it is common practice to mark it 
as deprecated and remove it in the next major release. It will be better to 
state this fact in the documentation now so that users know that this config is 
going away.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147252
---


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-07 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review148041
---



Thanks for the detailed doc. I mainly have question regarding to the 
event-loop.md. We can talk offline if you don't agree/understand my comments.


docs/learn/documentation/versioned/api/overview.md (line 37)


java NIO ==> Java NIO



docs/learn/documentation/versioned/api/overview.md (line 49)


This is a bit confusing. If I read these two comment lines as pseudo code, 
I would think that the same thread is going to call asynchronous calls and fire 
callback. Is it the case? I thought that the callback is triggered in a 
callback thread, not in the main thread invoking the asynchronous calls?



docs/learn/documentation/versioned/container/event-loop.md (line 26)


What do you try to say in the "Without explicit user synchronization..."? 
Is it by design, we are trying to make sure the global variables implements a 
read-after-write consistency among tasks, w/ the atomic step being the task 
operation? P.S. what's exactly a task operation? It is better to define what 
this is.



docs/learn/documentation/versioned/container/event-loop.md (line 28)


This section is a bit confusing to me. The description above in 
"Without..." sounds like that we are implementing read-after-write consistency 
on global variables. Here we are saying that global variables have to implement 
the sync method by the user.



docs/learn/documentation/versioned/container/event-loop.md (line 36)


So, this describes the AsyncRunLoop logic. Don't we still have the 
synchronous RunLoop in SamzaContainer?



docs/learn/documentation/versioned/container/event-loop.md (line 50)


from *a* different user thread.


- Yi Pan (Data Infrastructure)


On Sept. 7, 2016, 5:16 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Sept. 7, 2016, 5:16 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-07 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Sept. 7, 2016, 5:16 p.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Update based on Navina's feedback.


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-09-07 Thread Xinyu Liu


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 49
> > 
> >
> > // fire callback after process complete 
> > >> Is the user expected to invoke the callback here or is it samza that 
> > invokes the callback ? Not clear from the example. If the user has to 
> > invoke the callback, you can write the code snippet. 
> > 
> > In your previous diff, you had this line - "To complete the process, 
> > you need to call callback.complete() or callback.failure(). Samza will 
> > continue to process next message or shut down the job based on the callback 
> > status." . Is that still valid?

oh, it's still there. I moved it to the sentence before the async task example 
:).


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > 
> >
> > "any state change from one operation will be fully visible to others"?  
> > -> does this mean tasks share state (which in my opinion can be anything 
> > from shared variables to KV stores). What state change is made fully 
> > visible here? 
> > I think it is sufficient to say that these operatios within a task 
> > partition are mutually exclusive. You can probably remove the state change 
> > part.

Chris recommended this: it's actually pretty important for the user to know the 
visibility rules when using AsyncStreamTask.


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 56
> > 
> >
> > It feels like we are introducing new terms here "serialized mode". I 
> > think it is sufficient to simply state, if task.max.concurrency = 1, then 
> > each message process completion ...
> > 
> > Also "it is required to coordinate any shared state" can be re-phrased 
> > to "user should synchronize access to any shared/global variables in the 
> > Task."

Good suggestions. I remove the serialized mode notion.


> On Aug. 31, 2016, 12:54 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 357
> > 
> >
> > Is it too late to comment on the config key pattern? Traditionally, we 
> > don't have any strict conventions for config names listed anywhere. Since 
> > we use period as a delimiter to distinguish configs belonging to a 
> > namespace, I think we should try to avoid using the same in the actual 
> > config part. 
> > For example, job.container.single.thread.mode can be 
> > job.container.single-thread-mode.
> > Same for job.container.thread.pool.size, task.callback.timeout.ms and 
> > task.max.concurrency

This config is pretty awkward (so is the name). It's just for fallback to old 
runloop if there is any issue with the new AsyncRunLoop. So I don't expect 
users to set it normally. Once the AsyncRunLoop stablized, I will remove this 
config as well as the old runLoop.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-30 Thread Navina Ramesh


> On Aug. 25, 2016, 12:31 a.m., Navina Ramesh wrote:
> > Not that I intend to give you more work. However, adding an example of 
> > AsyncStreamTask to samza's hello-world will give a quick idea to the user 
> > on when to use it and how to use it. This can be augmented in a tutorial 
> > page, similar to the one Jake provided for samza-rest api. Thanks!

Any updates here, xinyu ?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146739
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-30 Thread Navina Ramesh


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 360
> > 
> >
> > Isn't there a performance impact if we use AsyncStreamTask by default? 
> > Esp. on jobs that don't need any async processing. 
> > Why is the default value false? 
> > 
> > If users upgrade to the new version and automatically start using 
> > multithreaded execution, will they see any performance impact?
> 
> Xinyu Liu wrote:
> The default of new AsyncRunLoop is still running in a single thread, and 
> the performance is on par with the previous RUnLoop. So the users shouldn't 
> see any perf difference.

Ok. I think the performance results are pretty important. Can you add a link to 
the performance test results pdf - 
https://issues.apache.org/jira/secure/attachment/12812335/perf-test-results.pdf,
 probably appropriate in the event-loop.md where you mention the guarantees 
that Samza provides.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-30 Thread Navina Ramesh


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 49
> > 
> >
> > Does anything change with respect to adding "task.subtask.class" in the 
> > process() method of AsyncStreamTask? Please check if this does not change 
> > the expected behavior in SAMZA-437. We use that at LinkedIn for 
> > restliWrapper (I think).
> 
> Xinyu Liu wrote:
> This is actually not implemented in open source, only in LinkedIn.

Ah.. looks like we added and then, removed it. The JIRA still explains the 
pattern that the developer can use. Cool!


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-30 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147388
---




docs/learn/documentation/versioned/api/overview.md (line 49)


// fire callback after process complete 
>> Is the user expected to invoke the callback here or is it samza that 
invokes the callback ? Not clear from the example. If the user has to invoke 
the callback, you can write the code snippet. 

In your previous diff, you had this line - "To complete the process, you 
need to call callback.complete() or callback.failure(). Samza will continue to 
process next message or shut down the job based on the callback status." . Is 
that still valid?



docs/learn/documentation/versioned/container/event-loop.md (line 28)


"any state change from one operation will be fully visible to others"?  -> 
does this mean tasks share state (which in my opinion can be anything from 
shared variables to KV stores). What state change is made fully visible here? 
I think it is sufficient to say that these operatios within a task 
partition are mutually exclusive. You can probably remove the state change part.



docs/learn/documentation/versioned/container/event-loop.md (line 45)


Thanks for adding the semantics section. Really clarifies things!



docs/learn/documentation/versioned/container/event-loop.md (line 52)


"This allows multiple outstanding messages to be processed per task at a 
time" -> I think it should be "This allows multiple outstanding messages to be 
processed parallely by the task".



docs/learn/documentation/versioned/container/event-loop.md (line 56)


It feels like we are introducing new terms here "serialized mode". I think 
it is sufficient to simply state, if task.max.concurrency = 1, then each 
message process completion ...

Also "it is required to coordinate any shared state" can be re-phrased to 
"user should synchronize access to any shared/global variables in the Task."



docs/learn/documentation/versioned/jobs/configuration-table.html (line 357)


Is it too late to comment on the config key pattern? Traditionally, we 
don't have any strict conventions for config names listed anywhere. Since we 
use period as a delimiter to distinguish configs belonging to a namespace, I 
think we should try to avoid using the same in the actual config part. 
For example, job.container.single.thread.mode can be 
job.container.single-thread-mode.
Same for job.container.thread.pool.size, task.callback.timeout.ms and 
task.max.concurrency


- Navina Ramesh


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-30 Thread Navina Ramesh


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 24
> > 
> >
> > There is no sharing of task state in Samza. Each task guarantee 
> > isolation from the others. What exactly are you referring to here? Unless I 
> > have misunderstood the semantics provided by processing in multiple threads
> 
> Xinyu Liu wrote:
> The shared state refers to global in-memory states, like singletons and 
> static vars.

Yeah. I kind of figured that's what you meant. But "state" is an very 
overloaded term. It will be great if you can modify the "sharing task state" to 
include examples. For example, sharing task state (via global in-memory states, 
singletons, static vars etc)


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-29 Thread Xinyu Liu


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 22
> > 
> >
> > "You should implement StreamTask for synchronous process, e.g. a 
> > computation that does not involve remote calls" 
> > 
> > This is not very clear. To me, it sounds like we are recommending a 
> > task interface based on a use-case. Instead, I think it will be useful to 
> > explain what you mean by "synchronous processing of a stream message" and 
> > compare it with asynchronous processing. This can be followed by the remote 
> > call example.

Added the definition there.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 54
> > 
> >
> > You have mentioned "processAsync". Your code sample above show 
> > "process". Please fix it.

Thanks for catching it.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 24
> > 
> >
> > There is no sharing of task state in Samza. Each task guarantee 
> > isolation from the others. What exactly are you referring to here? Unless I 
> > have misunderstood the semantics provided by processing in multiple threads

The shared state refers to global in-memory states, like singletons and static 
vars.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 43
> > 
> >
> > I think you are trying to explain 2 things:
> > 1. semantics of event loop when using AsyncStreamTask
> > 2. Semantics of event loop when using StreamTask with threadPool size > 
> > 1
> > 
> > Can you try to elaborate more on this? Perhaps just separating 
> > semantics of event loop for StreamTask and AsyncStreamTask will add more 
> > clarity.
> > 
> > Also, the processing order guarantees need to be called out. It is 
> > possible to process out-of-order within a partition if the task threadpool 
> > size is >1. This is a very important behavior change that needs to be 
> > documented here.

I ended up by adding section named "Semantics for Synchronous Tasks v.s. 
Asynchronous Tasks". Please take a look again.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 47
> > 
> >
> > nit: "through the standard InitableTask, ClosableTask, StreamTask / 
> > AsyncStreamTask, and WindowTask."

fixed.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 49
> > 
> >
> > Does anything change with respect to adding "task.subtask.class" in the 
> > process() method of AsyncStreamTask? Please check if this does not change 
> > the expected behavior in SAMZA-437. We use that at LinkedIn for 
> > restliWrapper (I think).

This is actually not implemented in open source, only in LinkedIn.


> On Aug. 25, 2016, 12:30 a.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 360
> > 
> >
> > Isn't there a performance impact if we use AsyncStreamTask by default? 
> > Esp. on jobs that don't need any async processing. 
> > Why is the default value false? 
> > 
> > If users upgrade to the new version and automatically start using 
> > multithreaded execution, will they see any performance impact?

The default of new AsyncRunLoop is still running in a single thread, and the 
performance is on par with the previous RUnLoop. So the users shouldn't see any 
perf difference.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
---


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-29 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review147252
---




docs/learn/documentation/versioned/jobs/configuration-table.html (line 368)


oh, actually the job.container.single.thread.mode  means using the old 
runloop which supports single thread execution only. It is default as false so 
we will use AsyncRunLoop. I plan to get rid of RunLoop in 11.1 once 
AsyncRunLoop is fully stablized.


- Xinyu Liu


On Aug. 30, 2016, 12:49 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated Aug. 30, 2016, 12:49 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-29 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/
---

(Updated Aug. 30, 2016, 12:49 a.m.)


Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
Infrastructure).


Changes
---

Updates based on Navina's feedback.


Repository: samza


Description
---

Update samza web docs with new multithreading api, core and configs.


Diffs (updated)
-

  docs/learn/documentation/versioned/api/overview.md 
6712344e84e19883b857e00549db2acb101c7e0e 
  docs/learn/documentation/versioned/container/event-loop.md 
116238312df7071747cbbc14bc9c46f558755195 
  docs/learn/documentation/versioned/jobs/configuration-table.html 
54c52981c3055b398ee60af50eeaf2592ed0e64f 

Diff: https://reviews.apache.org/r/50174/diff/


Testing
---

Test the web pages locally.


Thanks,

Xinyu Liu



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-24 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146739
---



Not that I intend to give you more work. However, adding an example of 
AsyncStreamTask to samza's hello-world will give a quick idea to the user on 
when to use it and how to use it. This can be augmented in a tutorial page, 
similar to the one Jake provided for samza-rest api. Thanks!

- Navina Ramesh


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-08-24 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review146554
---




docs/learn/documentation/versioned/api/overview.md (line 22)


"You should implement StreamTask for synchronous process, e.g. a 
computation that does not involve remote calls" 

This is not very clear. To me, it sounds like we are recommending a task 
interface based on a use-case. Instead, I think it will be useful to explain 
what you mean by "synchronous processing of a stream message" and compare it 
with asynchronous processing. This can be followed by the remote call example.



docs/learn/documentation/versioned/api/overview.md (line 54)


You have mentioned "processAsync". Your code sample above show "process". 
Please fix it.



docs/learn/documentation/versioned/container/event-loop.md (line 24)


There is no sharing of task state in Samza. Each task guarantee isolation 
from the others. What exactly are you referring to here? Unless I have 
misunderstood the semantics provided by processing in multiple threads



docs/learn/documentation/versioned/container/event-loop.md (line 41)


I don't understand when we are blocking. 

Are you saying "Block if (all task instances have outstanding messages to 
process or (window and commit is in progress)" ? 
Or "Block if (all task instances have outstanding messages to process or 
window) and (commit is in progress)?



docs/learn/documentation/versioned/container/event-loop.md (line 43)


I think you are trying to explain 2 things:
1. semantics of event loop when using AsyncStreamTask
2. Semantics of event loop when using StreamTask with threadPool size > 1

Can you try to elaborate more on this? Perhaps just separating semantics of 
event loop for StreamTask and AsyncStreamTask will add more clarity.

Also, the processing order guarantees need to be called out. It is possible 
to process out-of-order within a partition if the task threadpool size is >1. 
This is a very important behavior change that needs to be documented here.



docs/learn/documentation/versioned/container/event-loop.md (line 47)


nit: "through the standard InitableTask, ClosableTask, StreamTask / 
AsyncStreamTask, and WindowTask."



docs/learn/documentation/versioned/container/event-loop.md (line 49)


Does anything change with respect to adding "task.subtask.class" in the 
process() method of AsyncStreamTask? Please check if this does not change the 
expected behavior in SAMZA-437. We use that at LinkedIn for restliWrapper (I 
think).



docs/learn/documentation/versioned/jobs/configuration-table.html (line 360)


Isn't there a performance impact if we use AsyncStreamTask by default? Esp. 
on jobs that don't need any async processing. 
Why is the default value false? 

If users upgrade to the new version and automatically start using 
multithreaded execution, will they see any performance impact?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 368)


is this property used only if job.container.single.thread.mode == true ?


- Navina Ramesh


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-28 Thread Chris Pettitt

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review143972
---


Ship it!




Looks like this covers my comments. However, the documentation reads like it is 
OK to do synchronous IO by just throwing more threads at the problem. This is 
not a good idea. The ability to do synchronous IO is a transition step towards 
async, not an end state.

- Chris Pettitt


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-27 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review143847
---




docs/learn/documentation/versioned/api/overview.md (line 22)


added example for sync process



docs/learn/documentation/versioned/api/overview.md (line 37)


added example for async process



docs/learn/documentation/versioned/container/event-loop.md (line 26)


Add the condition and memeory visibility notes here for running StreamTask 
in parrallel.



docs/learn/documentation/versioned/container/event-loop.md (line 43)


Please take a look at this again.


- Xinyu Liu


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-27 Thread Xinyu Liu


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 43
> > 
> >
> > s/in the a single thread/in a single thread/.
> > 
> > A few other minor grammatical errors, but this is not the easy way to 
> > share them. If they're not obvious I can send you back a slightly edited 
> > version of this paragraph.
> > 
> > ---
> > 
> > Larger comment: as we'd ultimately want to move to a single 
> > implementation should we not allow process and window to run in parallel 
> > for the same task even with multiple threads?

Right, actually for any case process and window will not be run in parallel for 
the same task. I guess I didn't make it clear in the description. Could you 
please take a look again at the new diff and send me an updated version on the 
paragraph with the grammatical fixes? Thanks a lot!


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/api/overview.md, line 22
> > 
> >
> > Maybe provide an example of what "synchronous process" means. For 
> > example, a computation that does not involve remote calls.

Added the examples for both sync process and async process.


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 26
> > 
> >
> > When would you want to run a synchronous task in parallel? What are the 
> > rules (e.g. memory visibility) with such a configuration?

Added the conditions and rules. Please take a look again.


> On July 20, 2016, 7:06 p.m., Chris Pettitt wrote:
> > docs/learn/documentation/versioned/container/event-loop.md, line 28
> > 
> >
> > This doesn't quite sound right. Global state is a problem if there is > 
> > 1 concurrency (whether running with multiple samza threads or not). For 
> > example, async tasks may or may not be safe depending on concurrency. We 
> > also can make stronger guarantees than what is implied by the paragraph 
> > (e.g. state from process is fully visible to window and commit).

True, my doc was overly simplified. I added more details. Please take a look.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review142985
---


On July 27, 2016, 11:05 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 27, 2016, 11:05 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-20 Thread Chris Pettitt

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review142985
---




docs/learn/documentation/versioned/api/overview.md (line 22)


Maybe provide an example of what "synchronous process" means. For example, 
a computation that does not involve remote calls.



docs/learn/documentation/versioned/container/event-loop.md (line 26)


When would you want to run a synchronous task in parallel? What are the 
rules (e.g. memory visibility) with such a configuration?



docs/learn/documentation/versioned/container/event-loop.md (line 28)


This doesn't quite sound right. Global state is a problem if there is > 1 
concurrency (whether running with multiple samza threads or not). For example, 
async tasks may or may not be safe depending on concurrency. We also can make 
stronger guarantees than what is implied by the paragraph (e.g. state from 
process is fully visible to window and commit).



docs/learn/documentation/versioned/container/event-loop.md (line 43)


s/in the a single thread/in a single thread/.

A few other minor grammatical errors, but this is not the easy way to share 
them. If they're not obvious I can send you back a slightly edited version of 
this paragraph.

---

Larger comment: as we'd ultimately want to move to a single implementation 
should we not allow process and window to run in parallel for the same task 
even with multiple threads?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 636)


s/complete/completes/



docs/learn/documentation/versioned/jobs/configuration-table.html (line 638)


But also may result in out-of-order processing. Probably good to be 
explicit about this even if it should be obvious to most people.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 647)


What happens if a timeout occurs?


- Chris Pettitt


On July 19, 2016, 1:05 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> ---
> 
> (Updated July 19, 2016, 1:05 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> ---
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>