Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-21 Thread Chris Pettitt
I'm playing with ApplicationRunner, so I'll probably have more feedback.
For now, in addition to async run we also need async notification of
completion or failure. Also, ApplicationStatus should be able to give me
the cause of failure (e.g. via an Exception), not just a failure state.

On Thu, Apr 20, 2017 at 3:52 PM, Chris Pettitt 
wrote:

> It might be worth taking a look at how Beam does test streams. The API is
> more powerful than just passing in a queue, e.g.:
>
> TestStream source = TestStream.create(StringUtf8Coder.of())
> .addElements(TimestampedValue.of("this", start))
> .addElements(TimestampedValue.of("that", start))
> .addElements(TimestampedValue.of("future", 
> start.plus(Duration.standardMinutes(1
> .advanceProcessingTime(Duration.standardMinutes(3))
> .advanceWatermarkTo(start.plus(Duration.standardSeconds(30)))
> .advanceWatermarkTo(start.plus(Duration.standardMinutes(1)))
> .advanceWatermarkToInfinity();
>
> ---
>
> BTW, have we given up on the idea of a simpler input system, e.g. one that
> assumes all input messages are keyed? It seems it would be possible to
> support legacy "system streams" via an adapter that mapped K, V -> V' and
> could open the possibility of inputs in whatever for users want, e.g.
> (again from Beam):
>
> final Create.Values values = Create.of("test", "one", "two", "three");
>
> final TextIO.Read.Bound from = 
> TextIO.Read.from("src/main/resources/words.txt");
>
> final KafkaIO.Read reader = KafkaIO.read()
>
> .withBootstrapServers("myServer1:9092,myServer2:9092")
>
> .withTopics(topics)
>
> .withConsumerFactoryFn(new ConsumerFactoryFn(
>
> topics, 10, numElements, OffsetResetStrategy.EARLIEST))
>
> .withKeyCoder(BigEndianIntegerCoder.of())
>
> .withValueCoder(BigEndianLongCoder.of())
>
> .withMaxNumRecords(numElements);
> Ideally, such a simple input system specification would be useable in 
> production as well as test. At that point I don't know if we need a separate 
> TestApplicationRunner except perhaps as a hint to what we've been calling an 
> Environment?
>
> ---
>
> Aren't we supposed to be able to run applications without blocking (e.g.
> for embedded cases)? The API suggests that run is going to be a blocking
> call?
>
> - Chris
>
>
> On Thu, Apr 20, 2017 at 1:06 PM, Jacob Maes  wrote:
>
>> Thanks for the SEP!
>>
>> +1 on introducing these new components
>> -1 on the current definition of their roles (see Design feedback below)
>>
>> *Design*
>>
>>- If LocalJobRunner and RemoteJobRunner handle the different methods of
>>launching a Job, what additional value do the different types of
>>ApplicationRunner and RuntimeEnvironment provide? It seems like a red
>> flag
>>that all 3 would need to change from environment to environment. It
>>indicates that they don't have proper modularity. The
>> call-sequence-figures
>>support this; LocalApplicationRunner and RemoteApplicationRunner make
>> the
>>same calls and the diagram only varies after jobRunner.start()
>>- As far as I can tell, the only difference between Local and Remote
>>ApplicationRunner is that one is blocking and the other is
>> non-blocking. If
>>that's all they're for then either the names should be changed to
>> reflect
>>this, or they should be combined into one ApplicationRunner and just
>> expose
>>separate methods for run() and runBlocking()
>>- There isn't much detail on why the main() methods for Local/Remote
>>have such different implementations, how they receive the Application
>>(direct vs config), and concretely how the deployment scripts, if any,
>>should interact with them.
>>
>>
>> *Style*
>>
>>- nit: None of the 11 uses of the word "actual" in the doc are
>> *actually*
>>needed. :-)
>>- nit: Colors of the runtime blocks in the diagrams are unconventional
>>and a little distracting. Reminds me of nai won bao. Now I'm hungry.
>> :-)
>>- Prefer the name "ExecutionEnvironment" over "RuntimeEnvironment". The
>>term "execution environment" is used
>>- The code comparisons for the ApplicationRunners are not
>> apples-apples.
>>The local runner example is an application that USES the local runner.
>> The
>>remote runner example is the just the runner code itself. So, it's not
>>readily apparent that we're comparing the main() methods and not the
>>application itself.
>>
>>
>> On Mon, Apr 17, 2017 at 5:02 PM, Yi Pan  wrote:
>>
>> > Made some updates to clarify the role and functions of
>> RuntimeEnvironment
>> > in SEP-2.
>> >
>> > On Fri, Apr 14, 2017 at 9:30 AM, Yi Pan  wrote:
>> >
>> > > Hi, everyone,
>> > >
>> > > In light of new features such as fluent API and standalone that
>> introduce
>> > > new deployment / application launch models in Samza, I created a new
>> > 

[GitHub] samza pull request #133: SAMZA-1226: relax type parameters in MessageStream ...

2017-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/133


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-21 Thread Navina Ramesh
Hey Yi,
Thanks for lot for your work on this document. I know it must have been
crazy trying to put-together everything in a single doc :)

Here are my comments. Sorry about the delay :(

1. It will be useful to set some background for the benefit of the
community members who haven't been following design docs in the JIRAs. Can
you briefly explain the definition of StreamApplication and how it
translates to jobs through the stack.

2. "Problem" section doesn't seem to describe any problem that
ApplicationRunner is solving :) Imo, ApplicationRunner basically provides a
unified programming pattern for the user to execute StreamApplications
defined using fluent-api or task-level API. I think the problem and
motivation section can use a little bit of re-wording.

3. In the "Overview of ApplicationRunner" section:
* How the components within ApplicationRunner interact isn't very obvious
from the overview image. For example, ExecutionPlanner translates a
"StreamApplication" into an "ExecutionPlan" which is essentially a
specification of the DAG. (Please correct me, if I am wrong here!). The
ExecutionPlan is used by the JobRunner to launch Samza jobs.
* The roles of ExecutionPlanner and JobRunner are fairly well-defined.
StreamManager seems like a util class that helps class-load systems and
create streams. The ExecutionPlan will be consumed by JobRunner and
JobRunner will use StreamManager to create intermediate streams, prior to
launching jobs. It doesn't sound like a StreamManager is a "component" of
the ApplicationRunner.
* What is the role of the RuntimeEnvironment? That has not been explained.
Maybe explaining that will fill the gap in understanding for the readers. I
see that you have tried to explain the flow of control in the code using
the sequence diagram. Perhaps, if we can articulate the
roles/responsibilities of the RuntimeEnvironment, there will not be a need
for the control flow diagram.

4. How is runtime environment defined by the user? Is it configurable ?
Answering these questions in the doc will be useful

5. In the "Interaction between RuntimeEnvironment and ApplicationRunners"
section:
* Samza container is interacting with the RuntimeEnvironment. Does that
make the RuntimeEnvironment as a shared component between the
LocalApplicationRunner and the SamzaContainer? It doesn't seem to be the
case for RemoteApplicationRunner. So, I am confused as to why it is
different.

6. In general, what does "app.class" config represent?  It seems
straightforward when a "StreamApplication" is defined. Is it applicable
when using low-level task api?

7. Interface defintions:
* Perhaps when you implement this, can you specifically callout if each
method is blocking or not in the javadoc ?

8. Minor nit-picking:
* "ApplicationRunners in Different Execution Environments" -> should it be
RuntimeEnvironments as that is the terminology used in the rest of the
document.
* In the "How this works in standalone deployment" section:
* "Deploy the application to standalone hosts" and *Run run-local-app.sh on
each node to start/stop the local application* are probably just a single
step - Deploy the application to standalone hosts using run-local-app.sh??


General question:
It seems like, even with extensive changes to the interfaces/programming
model, we are still class loading the components for most parts. In such a
world, we are not close to integrating with frameworks that already have a
lifecycle model and can provide instantiated objects directly. For example,
in the Samza as a library use-case, it makes sense for the user to provide
a JmxServer or a taskFactory or a custom metricReporter for the
StreamProcessor. One of the motivations for this case was that most
applications are already running within a servlet/jetty container model
with its own lifecycle. If ApplicationRunner(s) is the unified interface,
doesn't that prohibit Samza from being integrated with such frameworks?

Thanks!
Navina

On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes  wrote:

> Thanks for the SEP!
>
> +1 on introducing these new components
> -1 on the current definition of their roles (see Design feedback below)
>
> *Design*
>
>- If LocalJobRunner and RemoteJobRunner handle the different methods of
>launching a Job, what additional value do the different types of
>ApplicationRunner and RuntimeEnvironment provide? It seems like a red
> flag
>that all 3 would need to change from environment to environment. It
>indicates that they don't have proper modularity. The
> call-sequence-figures
>support this; LocalApplicationRunner and RemoteApplicationRunner make
> the
>same calls and the diagram only varies after jobRunner.start()
>- As far as I can tell, the only difference between Local and Remote
>ApplicationRunner is that one is blocking and the other is
> non-blocking. If
>that's all they're for then either the names should be changed to
> reflect
>this, or they should be