[DISCUSS] Samza 1.5 release

2020-05-26 Thread Bharath Kumara Subramanian
Hi all,

We have accumulated few features/improvements since the last release and
would like to make Samza 1.5 release.

I wanted to kick off the discussion on the open source forum as some of
these changes have already been tested internally at LinkedIn. Some of the
features/improvements include but not limited to Simplifying Job Runner,
Container Placements and auto enable transactional state.

A comprehensive list of changes can be found here:
https://issues.apache.org/jira/browse/SAMZA-2527?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.5)

The new release branch has already been cut and the name is "1.5.0". I
would like to target the week of June 1st for voting.

Thank you,
Bharath


Re: Samza jdk11 support

2020-05-26 Thread Yi Pan
Hi, Debraj,

Thanks for the reminder. We did discussed about JDK11 before.
Unfortunately, I don't know whether JDK11 is up to the roadmap as of now.
Let me sync up with the team and get back to you.

-Yi

On Mon, May 25, 2020 at 9:53 AM Debraj Manna 
wrote:

> Hi
>
> I have seen a few earlier discussions on the email group which
> suggested that samza officially does not support jdk11. But those
> discussions seem to be old.
>
> Do the latest samza support jdk11?
>
> Thanks,
>


Re: [DISCUSS] SEP-22: Container Placements in Samza

2020-05-26 Thread Sanil Jain
Hi Xinyu,

Apologies, looks like I sent this from my Li email and was not updated, let
me send it again from my personal email.

Thanks for the feedback, here are my comments:

The title of the SEP is Container Placements in Samza, which sounds like
> it's going to reimplement the container placements logic currently in
> ClusterBasedJobCoordinator. After reading the proposal, I think it should
> be renamed to be a much smaller scope, e.g. Container Relocation Tool.


Container Placements title suggests building the ability to move/restart
containers of a job on a same or different host, the external tool moving
containers is one user of it, other users can be a Load balancer controller
(Cruise control like system), ASC controller resizing containers with
different sizes
Some other abilities are:

1. It will also give as an ability to override already existing locality
when one or partial containers are down with job running in a degraded state

2. Spin up & place a standby for one or few containers


Hence I feel the naming is justified

- Goal: the goal is to build a tool, right? I want to be crystal clear on
> that. After the solution, I think there will be an tool to do container
> relocation for Samza open source. Build the ability doesn't mean much to
> Samza open source users I believe.


Sure the Goals section mention "Expose these APIs via a tool or dashboard
for operability", I can add more emphasis on it. At the same time,
open-source users can use these API in their custom tools for addressing
Operability

- Proposed Solution: after reading this I think the rejected the option is
> to write to metadastore, and the accepted one is to develop a container
> placement service. But then I read the architecture part, the proposal
> says
> the preferred approach is to use Samza metastore API to read and write the
> container placement metadata. It feels contradictory to me. Seems the
> metadatsstore is a must-have piece in this solution, so it might be cleaner
> to remove the first solution.


Nice catch, the first approach is actually directly modifying the locality
mapping in the metadata store (coordinator stream) as compared to the other
approach where we use metadata store to read/write Container placement
request/response under a *separate metastore namespace. *Let me add these
details so others don't get confused

- I don't understand the "service" part of the container placement. Is it a
> separate service that can be hosted somewhere? Is it based on jetty or
> netty? what's the communication protocol? From reading the proposal, it
> looks like a set of API instead of service. If that's the case, please
> remove all the usage of "container placement service" in this proposal.


Its referred to as a service since there is a listener of events running in
the JobCoordinator (ContainerPlacementHandler) which relays the container
placement request messages to the JobCoordinator, metastore is just used as
a communication channel which we can logically compare with a REST Service
which where we have an event handler listening on a port for request. The
communication protocol for request/response messages is defined by us using
metastore (Kafka) as a communication channel. It's not an independent
service that can be hosted somewhere but more an embedded one.

If can think of a better alternative for this & follow up.

- key-value format: the description of the uuid of the
> ContainerPlacementRequestMessage has a typo I think (response -> request).


Corrected, thanks!

- The diagram of the components interaction look nice, and definitely helps
> me understand what the solution will look like. I didn't find a description
> about writing the new locality information. I am interested in the ordering
> of that regarding to the ordering of reserving resource and running the
> container. I feel that part will be a bit complicated.


Nice catch, I added this section to the doc:

*Who writes the new locality mapping after a successful move?*

Samza container on a successful start write their new locality message to
the metadata store (code
),
hence after a successful move container writes its new locality

- public interfaces: are these interfaces intended to by used by samza
> users? How do they use them? Is there an example? Are these interfaces
> going into samza-api? Are those interfaces stable? Why the uuid in the
> interface cannot be generated by Samza itself? Why


Let me add an example to use them and add two-section Public Interfaces
containing (ContainerPlacementMessage, ContainerPlacementRequestMessage,
and ContainerPlacementResponseMessage) with the interface stability
evolving. UUID is generated by samza itself, please check
ContainerPlacementMetadataStore#writeContainerPlacementRequestMessage. In
terms of usage please see the answer to the next question I added a sample