Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-16 Thread Yi Pan
Hey, Cameron,

Thanks for the detailed answers. It would be good to add this explanation
to the SEP page as well.

Otherwise, +1 from my side. Thanks!

-Yi

On Mon, Mar 16, 2020 at 10:06 AM Cameron Lee 
wrote:

> You have the correct understanding about the "yarn.resources.*"
> configuration, and your question is a good one. Currently, the
> implementation is that Samza will look in a specific place on the file
> system (i.e. /__samzaFrameworkApi and  working directory>/__samzaFrameworkInfrastructure) to get the
> API/infrastructure classpaths. I have a TODO in the code to make the file
> system location configurable (or specified through an environment
> variable). The configuration or environment variable for the file system
> location would not be YARN-specific, and it would be applicable to any
> execution environment.
>
> On Wed, Mar 11, 2020 at 10:54 PM Yi Pan  wrote:
>
> > OK. If I understand correctly, your answer is the following:
> > yarn.resources.* configuration variables are used by YARN localizer to
> make
> > API and infrastructure classpath available, together with the
> application's
> > own classpath, which is also determined by the YARN localizer.
> > The question here is: how do we let the container JVM know the
> > API/infrastructure classpaths when launching the container processes? If
> > the API and infrastructure classpaths (i.e. installation path determined
> by
> > the localizer) are customizable, then we would need to tell the container
> > JVM those API/infra classpaths via some configuration variables as well,
> > right? Hence, those configuration variable names need to be understood by
> > the Samza application's code (which is run within the container) as well.
> > If not, what's the mechanism that we will use to let the container JVM
> > process to know where the YARN localizer has put API/infra classpaths?
> >
> > Thanks!
> >
> > -Yi
> >
> >
> >
> > On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee 
> > wrote:
> >
> > > The configuration variables are only used by the YARN localizer. The
> > Samza
> > > application will look for the framework resources in certain places in
> > the
> > > application's working directory when it needs to access them. My aim is
> > to
> > > do something similar to how "yarn.package.path" works. In other
> execution
> > > environments, it is my understanding that "yarn.package.path" would get
> > > replaced by a different environment-specific configuration key/value.
> > > I agree that we should not use "yarn.resources.*" if the configurations
> > are
> > > not YARN-specific. Do you think that these resource localization
> configs
> > > are generalizable to arbitrary environments? If so, does that mean
> > > "yarn.package.path" is also generalizable? For example, what if some
> > > execution environment does not use URLs to specify resource locations
> > > (although maybe this isn't a reasonable concern to worry about?)?
> > >
> > > Thanks,
> > > Cameron
> > >
> > > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:
> > >
> > > > Hi, Cameron,
> > > >
> > > > Thanks for the quick responses! Appreciate it.
> > > >
> > > > I am still having a concern on a): are those configuration variables
> > used
> > > > by YARN localizer or by Samza applications? If those are used only by
> > the
> > > > YARN localizer, I agree that we should keep those as yarn specific.
> > > > Otherwise, I think that would still be better to name those as
> > > > cluster.based.resources.*. The reason being: Samza applications are
> > > > supposed to be able to run on different execution environments.
> > Ideally,
> > > > when we are deploying the same Samza application on YARN vs Mesos or
> > > > managed K8s clusters, we should only need to change the configure
> > values,
> > > > not the configuration variable names and values. Does it make sense?
> > > > Otherwise, we can schedule a conf call to clarify that.
> > > >
> > > > Thanks!
> > > >
> > > > -Yi
> > > >
> > > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee  >
> > > > wrote:
> > > >
> > > > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > > > resources into the working directory for the process. I felt that
> the
> > > > > specific configuration format to specify these resources might be
> > > > > YARN-specific (e.g. YARN has type and visibility configs for each
> of
> > > its
> > > > > resources), so a generic format might not apply. In a non-YARN
> case,
> > > the
> > > > > localization configs would need to be specified according to the
> > > > technology
> > > > > being used.
> > > > > b) It is correct that the Avro version will need to be compatible
> > with
> > > > the
> > > > > version that is used by the infrastructure, if infrastructure needs
> > to
> > > > use
> > > > > Avro and pass the Avro object to the application. This is the case
> > with
> > > > any
> > > > > serde technology that needs to be used. For the job coordinator, it
> > is
> > > > not
> > > > > much of a concern anyways, 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-16 Thread Cameron Lee
You have the correct understanding about the "yarn.resources.*"
configuration, and your question is a good one. Currently, the
implementation is that Samza will look in a specific place on the file
system (i.e. /__samzaFrameworkApi and /__samzaFrameworkInfrastructure) to get the
API/infrastructure classpaths. I have a TODO in the code to make the file
system location configurable (or specified through an environment
variable). The configuration or environment variable for the file system
location would not be YARN-specific, and it would be applicable to any
execution environment.

On Wed, Mar 11, 2020 at 10:54 PM Yi Pan  wrote:

> OK. If I understand correctly, your answer is the following:
> yarn.resources.* configuration variables are used by YARN localizer to make
> API and infrastructure classpath available, together with the application's
> own classpath, which is also determined by the YARN localizer.
> The question here is: how do we let the container JVM know the
> API/infrastructure classpaths when launching the container processes? If
> the API and infrastructure classpaths (i.e. installation path determined by
> the localizer) are customizable, then we would need to tell the container
> JVM those API/infra classpaths via some configuration variables as well,
> right? Hence, those configuration variable names need to be understood by
> the Samza application's code (which is run within the container) as well.
> If not, what's the mechanism that we will use to let the container JVM
> process to know where the YARN localizer has put API/infra classpaths?
>
> Thanks!
>
> -Yi
>
>
>
> On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee 
> wrote:
>
> > The configuration variables are only used by the YARN localizer. The
> Samza
> > application will look for the framework resources in certain places in
> the
> > application's working directory when it needs to access them. My aim is
> to
> > do something similar to how "yarn.package.path" works. In other execution
> > environments, it is my understanding that "yarn.package.path" would get
> > replaced by a different environment-specific configuration key/value.
> > I agree that we should not use "yarn.resources.*" if the configurations
> are
> > not YARN-specific. Do you think that these resource localization configs
> > are generalizable to arbitrary environments? If so, does that mean
> > "yarn.package.path" is also generalizable? For example, what if some
> > execution environment does not use URLs to specify resource locations
> > (although maybe this isn't a reasonable concern to worry about?)?
> >
> > Thanks,
> > Cameron
> >
> > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:
> >
> > > Hi, Cameron,
> > >
> > > Thanks for the quick responses! Appreciate it.
> > >
> > > I am still having a concern on a): are those configuration variables
> used
> > > by YARN localizer or by Samza applications? If those are used only by
> the
> > > YARN localizer, I agree that we should keep those as yarn specific.
> > > Otherwise, I think that would still be better to name those as
> > > cluster.based.resources.*. The reason being: Samza applications are
> > > supposed to be able to run on different execution environments.
> Ideally,
> > > when we are deploying the same Samza application on YARN vs Mesos or
> > > managed K8s clusters, we should only need to change the configure
> values,
> > > not the configuration variable names and values. Does it make sense?
> > > Otherwise, we can schedule a conf call to clarify that.
> > >
> > > Thanks!
> > >
> > > -Yi
> > >
> > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee 
> > > wrote:
> > >
> > > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > > resources into the working directory for the process. I felt that the
> > > > specific configuration format to specify these resources might be
> > > > YARN-specific (e.g. YARN has type and visibility configs for each of
> > its
> > > > resources), so a generic format might not apply. In a non-YARN case,
> > the
> > > > localization configs would need to be specified according to the
> > > technology
> > > > being used.
> > > > b) It is correct that the Avro version will need to be compatible
> with
> > > the
> > > > version that is used by the infrastructure, if infrastructure needs
> to
> > > use
> > > > Avro and pass the Avro object to the application. This is the case
> with
> > > any
> > > > serde technology that needs to be used. For the job coordinator, it
> is
> > > not
> > > > much of a concern anyways, since it is not doing serde of Avro
> > messages.
> > > > This may be more of a concern for general split deployment, which
> will
> > > > impact the processing containers, and will be a separate SEP.
> > > > c) It should work to leave infrastructure serdes in the
> infrastructure
> > > > classpath. The infrastructure serdes just see generic types (which
> are
> > > > java.lang.Object at runtime) for the messages, and they don't do
> > anything
> > > > with 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-11 Thread Yi Pan
OK. If I understand correctly, your answer is the following:
yarn.resources.* configuration variables are used by YARN localizer to make
API and infrastructure classpath available, together with the application's
own classpath, which is also determined by the YARN localizer.
The question here is: how do we let the container JVM know the
API/infrastructure classpaths when launching the container processes? If
the API and infrastructure classpaths (i.e. installation path determined by
the localizer) are customizable, then we would need to tell the container
JVM those API/infra classpaths via some configuration variables as well,
right? Hence, those configuration variable names need to be understood by
the Samza application's code (which is run within the container) as well.
If not, what's the mechanism that we will use to let the container JVM
process to know where the YARN localizer has put API/infra classpaths?

Thanks!

-Yi



On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee  wrote:

> The configuration variables are only used by the YARN localizer. The Samza
> application will look for the framework resources in certain places in the
> application's working directory when it needs to access them. My aim is to
> do something similar to how "yarn.package.path" works. In other execution
> environments, it is my understanding that "yarn.package.path" would get
> replaced by a different environment-specific configuration key/value.
> I agree that we should not use "yarn.resources.*" if the configurations are
> not YARN-specific. Do you think that these resource localization configs
> are generalizable to arbitrary environments? If so, does that mean
> "yarn.package.path" is also generalizable? For example, what if some
> execution environment does not use URLs to specify resource locations
> (although maybe this isn't a reasonable concern to worry about?)?
>
> Thanks,
> Cameron
>
> On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:
>
> > Hi, Cameron,
> >
> > Thanks for the quick responses! Appreciate it.
> >
> > I am still having a concern on a): are those configuration variables used
> > by YARN localizer or by Samza applications? If those are used only by the
> > YARN localizer, I agree that we should keep those as yarn specific.
> > Otherwise, I think that would still be better to name those as
> > cluster.based.resources.*. The reason being: Samza applications are
> > supposed to be able to run on different execution environments. Ideally,
> > when we are deploying the same Samza application on YARN vs Mesos or
> > managed K8s clusters, we should only need to change the configure values,
> > not the configuration variable names and values. Does it make sense?
> > Otherwise, we can schedule a conf call to clarify that.
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee 
> > wrote:
> >
> > > a) The "yarn.resources.*" configs are for localizing the necessary
> > > resources into the working directory for the process. I felt that the
> > > specific configuration format to specify these resources might be
> > > YARN-specific (e.g. YARN has type and visibility configs for each of
> its
> > > resources), so a generic format might not apply. In a non-YARN case,
> the
> > > localization configs would need to be specified according to the
> > technology
> > > being used.
> > > b) It is correct that the Avro version will need to be compatible with
> > the
> > > version that is used by the infrastructure, if infrastructure needs to
> > use
> > > Avro and pass the Avro object to the application. This is the case with
> > any
> > > serde technology that needs to be used. For the job coordinator, it is
> > not
> > > much of a concern anyways, since it is not doing serde of Avro
> messages.
> > > This may be more of a concern for general split deployment, which will
> > > impact the processing containers, and will be a separate SEP.
> > > c) It should work to leave infrastructure serdes in the infrastructure
> > > classpath. The infrastructure serdes just see generic types (which are
> > > java.lang.Object at runtime) for the messages, and they don't do
> anything
> > > with the concrete types, so in the infrastructure classes, the messages
> > get
> > > passed around as Object, but their concrete classes can still be loaded
> > > from the application. As with (b), this is more of a concern for
> general
> > > split deployment, since the job coordinator doesn't do message serde. I
> > > have run some tests regarding this classloading pattern, but we will do
> > > further verification for general split deployment.
> > > d) Yes, you are correct. Good catch. It should be "described above at
> > > Application classloader".
> > >
> > > Thanks for all of your questions. I will clarify some details in the
> doc
> > > regarding your questions.
> > >
> > > Cameron
> > >
> > > On Mon, Mar 9, 2020 at 12:07 PM Yi Pan  wrote:
> > >
> > > > Hi, Cameron,
> > > >
> > > > Sorry to chime in late. Overall, looks great! 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-11 Thread Cameron Lee
The configuration variables are only used by the YARN localizer. The Samza
application will look for the framework resources in certain places in the
application's working directory when it needs to access them. My aim is to
do something similar to how "yarn.package.path" works. In other execution
environments, it is my understanding that "yarn.package.path" would get
replaced by a different environment-specific configuration key/value.
I agree that we should not use "yarn.resources.*" if the configurations are
not YARN-specific. Do you think that these resource localization configs
are generalizable to arbitrary environments? If so, does that mean
"yarn.package.path" is also generalizable? For example, what if some
execution environment does not use URLs to specify resource locations
(although maybe this isn't a reasonable concern to worry about?)?

Thanks,
Cameron

On Wed, Mar 11, 2020 at 4:43 PM Yi Pan  wrote:

> Hi, Cameron,
>
> Thanks for the quick responses! Appreciate it.
>
> I am still having a concern on a): are those configuration variables used
> by YARN localizer or by Samza applications? If those are used only by the
> YARN localizer, I agree that we should keep those as yarn specific.
> Otherwise, I think that would still be better to name those as
> cluster.based.resources.*. The reason being: Samza applications are
> supposed to be able to run on different execution environments. Ideally,
> when we are deploying the same Samza application on YARN vs Mesos or
> managed K8s clusters, we should only need to change the configure values,
> not the configuration variable names and values. Does it make sense?
> Otherwise, we can schedule a conf call to clarify that.
>
> Thanks!
>
> -Yi
>
> On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee 
> wrote:
>
> > a) The "yarn.resources.*" configs are for localizing the necessary
> > resources into the working directory for the process. I felt that the
> > specific configuration format to specify these resources might be
> > YARN-specific (e.g. YARN has type and visibility configs for each of its
> > resources), so a generic format might not apply. In a non-YARN case, the
> > localization configs would need to be specified according to the
> technology
> > being used.
> > b) It is correct that the Avro version will need to be compatible with
> the
> > version that is used by the infrastructure, if infrastructure needs to
> use
> > Avro and pass the Avro object to the application. This is the case with
> any
> > serde technology that needs to be used. For the job coordinator, it is
> not
> > much of a concern anyways, since it is not doing serde of Avro messages.
> > This may be more of a concern for general split deployment, which will
> > impact the processing containers, and will be a separate SEP.
> > c) It should work to leave infrastructure serdes in the infrastructure
> > classpath. The infrastructure serdes just see generic types (which are
> > java.lang.Object at runtime) for the messages, and they don't do anything
> > with the concrete types, so in the infrastructure classes, the messages
> get
> > passed around as Object, but their concrete classes can still be loaded
> > from the application. As with (b), this is more of a concern for general
> > split deployment, since the job coordinator doesn't do message serde. I
> > have run some tests regarding this classloading pattern, but we will do
> > further verification for general split deployment.
> > d) Yes, you are correct. Good catch. It should be "described above at
> > Application classloader".
> >
> > Thanks for all of your questions. I will clarify some details in the doc
> > regarding your questions.
> >
> > Cameron
> >
> > On Mon, Mar 9, 2020 at 12:07 PM Yi Pan  wrote:
> >
> > > Hi, Cameron,
> > >
> > > Sorry to chime in late. Overall, looks great! I do have a few
> > > suggestions/questions before I can cast my vote here:
> > > a) for the configuration variable names, why are we limiting ourselves
> to
> > > yarn.resource.*? We have changed some of the configuration variables
> from
> > > yarn specific to non-yarn specific. I would love to keep that
> consistent
> > > (i.e. gradually moving all our yarn-specific configuration variables to
> > > non-yarn-specifc names)
> > > b) for the avro case as referred to in the delegation case in the
> > > Infrastructure classloader, if we delegate the object deserialization
> > class
> > > to the application classloader, would it be possible that the
> application
> > > provides an non-compatible version of avro class than the ones used
> > within
> > > the "infrastructure plugins" and hence causing runtime exception in the
> > > infrastructure plugin? Or is the solution being: do not directly use
> > serde
> > > classes in the infrastructure code?
> > > c) following the description of infrastructure classloader flow, where
> > > should we expect the serde classes? In the application classpath, I
> > guess?
> > > So, does that mean that we should 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-11 Thread Yi Pan
Hi, Cameron,

Thanks for the quick responses! Appreciate it.

I am still having a concern on a): are those configuration variables used
by YARN localizer or by Samza applications? If those are used only by the
YARN localizer, I agree that we should keep those as yarn specific.
Otherwise, I think that would still be better to name those as
cluster.based.resources.*. The reason being: Samza applications are
supposed to be able to run on different execution environments. Ideally,
when we are deploying the same Samza application on YARN vs Mesos or
managed K8s clusters, we should only need to change the configure values,
not the configuration variable names and values. Does it make sense?
Otherwise, we can schedule a conf call to clarify that.

Thanks!

-Yi

On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee  wrote:

> a) The "yarn.resources.*" configs are for localizing the necessary
> resources into the working directory for the process. I felt that the
> specific configuration format to specify these resources might be
> YARN-specific (e.g. YARN has type and visibility configs for each of its
> resources), so a generic format might not apply. In a non-YARN case, the
> localization configs would need to be specified according to the technology
> being used.
> b) It is correct that the Avro version will need to be compatible with the
> version that is used by the infrastructure, if infrastructure needs to use
> Avro and pass the Avro object to the application. This is the case with any
> serde technology that needs to be used. For the job coordinator, it is not
> much of a concern anyways, since it is not doing serde of Avro messages.
> This may be more of a concern for general split deployment, which will
> impact the processing containers, and will be a separate SEP.
> c) It should work to leave infrastructure serdes in the infrastructure
> classpath. The infrastructure serdes just see generic types (which are
> java.lang.Object at runtime) for the messages, and they don't do anything
> with the concrete types, so in the infrastructure classes, the messages get
> passed around as Object, but their concrete classes can still be loaded
> from the application. As with (b), this is more of a concern for general
> split deployment, since the job coordinator doesn't do message serde. I
> have run some tests regarding this classloading pattern, but we will do
> further verification for general split deployment.
> d) Yes, you are correct. Good catch. It should be "described above at
> Application classloader".
>
> Thanks for all of your questions. I will clarify some details in the doc
> regarding your questions.
>
> Cameron
>
> On Mon, Mar 9, 2020 at 12:07 PM Yi Pan  wrote:
>
> > Hi, Cameron,
> >
> > Sorry to chime in late. Overall, looks great! I do have a few
> > suggestions/questions before I can cast my vote here:
> > a) for the configuration variable names, why are we limiting ourselves to
> > yarn.resource.*? We have changed some of the configuration variables from
> > yarn specific to non-yarn specific. I would love to keep that consistent
> > (i.e. gradually moving all our yarn-specific configuration variables to
> > non-yarn-specifc names)
> > b) for the avro case as referred to in the delegation case in the
> > Infrastructure classloader, if we delegate the object deserialization
> class
> > to the application classloader, would it be possible that the application
> > provides an non-compatible version of avro class than the ones used
> within
> > the "infrastructure plugins" and hence causing runtime exception in the
> > infrastructure plugin? Or is the solution being: do not directly use
> serde
> > classes in the infrastructure code?
> > c) following the description of infrastructure classloader flow, where
> > should we expect the serde classes? In the application classpath, I
> guess?
> > So, does that mean that we should exclude serde classes (including
> > SerializableSerde and JsonSerdeV2) in the Samza infrastructure package,
> and
> > tell the users to package them in application package?
> > d) I am a bit confused about the description on "multiple" application
> > classloaders on the job coordinator: one is for the describe flow and the
> > other is in the "Application" classloader, instead of "API" classloader,
> > right?
> >
> > Best,
> >
> > -Yi
> >
> >
> > On Wed, Mar 4, 2020 at 11:32 AM Ke Wu  wrote:
> >
> > > +1.
> > >
> > > Thanks for driving this effort.
> > >
> > > Best,
> > > Ke
> > >
> > > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com>
> > > wrote:
> > > >
> > > > +1 binding.
> > > >
> > > > Thanks Cameron. I look forward to this feature taking our "Stream
> > > > Processing as a service" offering to the next level.
> > > >
> > > > Cheers
> > > >
> > > > On Tuesday, March 3, 2020, Prateek Maheshwari 
> > > wrote:
> > > >
> > > >> +1 (binding) from me. Thanks for contributing this feature. Looking
> > > forward
> > > >> to having dependency isolation and 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-10 Thread Cameron Lee
a) The "yarn.resources.*" configs are for localizing the necessary
resources into the working directory for the process. I felt that the
specific configuration format to specify these resources might be
YARN-specific (e.g. YARN has type and visibility configs for each of its
resources), so a generic format might not apply. In a non-YARN case, the
localization configs would need to be specified according to the technology
being used.
b) It is correct that the Avro version will need to be compatible with the
version that is used by the infrastructure, if infrastructure needs to use
Avro and pass the Avro object to the application. This is the case with any
serde technology that needs to be used. For the job coordinator, it is not
much of a concern anyways, since it is not doing serde of Avro messages.
This may be more of a concern for general split deployment, which will
impact the processing containers, and will be a separate SEP.
c) It should work to leave infrastructure serdes in the infrastructure
classpath. The infrastructure serdes just see generic types (which are
java.lang.Object at runtime) for the messages, and they don't do anything
with the concrete types, so in the infrastructure classes, the messages get
passed around as Object, but their concrete classes can still be loaded
from the application. As with (b), this is more of a concern for general
split deployment, since the job coordinator doesn't do message serde. I
have run some tests regarding this classloading pattern, but we will do
further verification for general split deployment.
d) Yes, you are correct. Good catch. It should be "described above at
Application classloader".

Thanks for all of your questions. I will clarify some details in the doc
regarding your questions.

Cameron

On Mon, Mar 9, 2020 at 12:07 PM Yi Pan  wrote:

> Hi, Cameron,
>
> Sorry to chime in late. Overall, looks great! I do have a few
> suggestions/questions before I can cast my vote here:
> a) for the configuration variable names, why are we limiting ourselves to
> yarn.resource.*? We have changed some of the configuration variables from
> yarn specific to non-yarn specific. I would love to keep that consistent
> (i.e. gradually moving all our yarn-specific configuration variables to
> non-yarn-specifc names)
> b) for the avro case as referred to in the delegation case in the
> Infrastructure classloader, if we delegate the object deserialization class
> to the application classloader, would it be possible that the application
> provides an non-compatible version of avro class than the ones used within
> the "infrastructure plugins" and hence causing runtime exception in the
> infrastructure plugin? Or is the solution being: do not directly use serde
> classes in the infrastructure code?
> c) following the description of infrastructure classloader flow, where
> should we expect the serde classes? In the application classpath, I guess?
> So, does that mean that we should exclude serde classes (including
> SerializableSerde and JsonSerdeV2) in the Samza infrastructure package, and
> tell the users to package them in application package?
> d) I am a bit confused about the description on "multiple" application
> classloaders on the job coordinator: one is for the describe flow and the
> other is in the "Application" classloader, instead of "API" classloader,
> right?
>
> Best,
>
> -Yi
>
>
> On Wed, Mar 4, 2020 at 11:32 AM Ke Wu  wrote:
>
> > +1.
> >
> > Thanks for driving this effort.
> >
> > Best,
> > Ke
> >
> > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com>
> > wrote:
> > >
> > > +1 binding.
> > >
> > > Thanks Cameron. I look forward to this feature taking our "Stream
> > > Processing as a service" offering to the next level.
> > >
> > > Cheers
> > >
> > > On Tuesday, March 3, 2020, Prateek Maheshwari 
> > wrote:
> > >
> > >> +1 (binding) from me. Thanks for contributing this feature. Looking
> > forward
> > >> to having dependency isolation and to the ability to upgrade the
> > framework
> > >> independently from an application.
> > >>
> > >> Thanks,
> > >> Prateek
> > >>
> > >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee  >
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> > >>> Dependency Isolation. Thanks to everyone who reviewed the proposal
> and
> > >>> provided feedback.
> > >>>
> > >>> I have addressed comments on the SEP, and I am not aware of any
> further
> > >>> major questions or objections, so I am starting this vote.
> > >>>
> > >>> SEP link:
> > >>>
> > >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> > >>>
> > >>> Discuss thread:
> > >>>
> > >>> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> > >> 3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com
> %3e
> > >>> There was also some discussion through comments on the SEP page (see
> 

Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-09 Thread Yi Pan
Hi, Cameron,

Sorry to chime in late. Overall, looks great! I do have a few
suggestions/questions before I can cast my vote here:
a) for the configuration variable names, why are we limiting ourselves to
yarn.resource.*? We have changed some of the configuration variables from
yarn specific to non-yarn specific. I would love to keep that consistent
(i.e. gradually moving all our yarn-specific configuration variables to
non-yarn-specifc names)
b) for the avro case as referred to in the delegation case in the
Infrastructure classloader, if we delegate the object deserialization class
to the application classloader, would it be possible that the application
provides an non-compatible version of avro class than the ones used within
the "infrastructure plugins" and hence causing runtime exception in the
infrastructure plugin? Or is the solution being: do not directly use serde
classes in the infrastructure code?
c) following the description of infrastructure classloader flow, where
should we expect the serde classes? In the application classpath, I guess?
So, does that mean that we should exclude serde classes (including
SerializableSerde and JsonSerdeV2) in the Samza infrastructure package, and
tell the users to package them in application package?
d) I am a bit confused about the description on "multiple" application
classloaders on the job coordinator: one is for the describe flow and the
other is in the "Application" classloader, instead of "API" classloader,
right?

Best,

-Yi


On Wed, Mar 4, 2020 at 11:32 AM Ke Wu  wrote:

> +1.
>
> Thanks for driving this effort.
>
> Best,
> Ke
>
> > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman 
> wrote:
> >
> > +1 binding.
> >
> > Thanks Cameron. I look forward to this feature taking our "Stream
> > Processing as a service" offering to the next level.
> >
> > Cheers
> >
> > On Tuesday, March 3, 2020, Prateek Maheshwari 
> wrote:
> >
> >> +1 (binding) from me. Thanks for contributing this feature. Looking
> forward
> >> to having dependency isolation and to the ability to upgrade the
> framework
> >> independently from an application.
> >>
> >> Thanks,
> >> Prateek
> >>
> >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> >>> Dependency Isolation. Thanks to everyone who reviewed the proposal and
> >>> provided feedback.
> >>>
> >>> I have addressed comments on the SEP, and I am not aware of any further
> >>> major questions or objections, so I am starting this vote.
> >>>
> >>> SEP link:
> >>>
> >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> >>>
> >>> Discuss thread:
> >>>
> >>> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> >> 3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
> >>> There was also some discussion through comments on the SEP page (see
> >>> Resolved Comments).
> >>>
> >>> Please vote:
> >>> [ ] +1 approve
> >>> [ ] +0 no opinion
> >>> [ ] -1 disapprove (and reason why)
> >>>
> >>> Thank you,
> >>> Cameron
> >>>
> >>
> >
> >
> > --
> > Jagadish
>
>


Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-04 Thread Ke Wu
+1.

Thanks for driving this effort.

Best,
Ke

> On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman  
> wrote:
> 
> +1 binding.
> 
> Thanks Cameron. I look forward to this feature taking our "Stream
> Processing as a service" offering to the next level.
> 
> Cheers
> 
> On Tuesday, March 3, 2020, Prateek Maheshwari  wrote:
> 
>> +1 (binding) from me. Thanks for contributing this feature. Looking forward
>> to having dependency isolation and to the ability to upgrade the framework
>> independently from an application.
>> 
>> Thanks,
>> Prateek
>> 
>> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
>>> Dependency Isolation. Thanks to everyone who reviewed the proposal and
>>> provided feedback.
>>> 
>>> I have addressed comments on the SEP, and I am not aware of any further
>>> major questions or objections, so I am starting this vote.
>>> 
>>> SEP link:
>>> 
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
>> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
>>> 
>>> Discuss thread:
>>> 
>>> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
>> 3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
>>> There was also some discussion through comments on the SEP page (see
>>> Resolved Comments).
>>> 
>>> Please vote:
>>> [ ] +1 approve
>>> [ ] +0 no opinion
>>> [ ] -1 disapprove (and reason why)
>>> 
>>> Thank you,
>>> Cameron
>>> 
>> 
> 
> 
> -- 
> Jagadish



Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-03 Thread Jagadish Venkatraman
+1 binding.

Thanks Cameron. I look forward to this feature taking our "Stream
Processing as a service" offering to the next level.

Cheers

On Tuesday, March 3, 2020, Prateek Maheshwari  wrote:

> +1 (binding) from me. Thanks for contributing this feature. Looking forward
> to having dependency isolation and to the ability to upgrade the framework
> independently from an application.
>
> Thanks,
> Prateek
>
> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
> wrote:
>
> > Hi all,
> >
> > This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> > Dependency Isolation. Thanks to everyone who reviewed the proposal and
> > provided feedback.
> >
> > I have addressed comments on the SEP, and I am not aware of any further
> > major questions or objections, so I am starting this vote.
> >
> > SEP link:
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> >
> > Discuss thread:
> >
> > https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> 3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
> > There was also some discussion through comments on the SEP page (see
> > Resolved Comments).
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Thank you,
> > Cameron
> >
>


-- 
Jagadish


Re: [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

2020-03-03 Thread Prateek Maheshwari
+1 (binding) from me. Thanks for contributing this feature. Looking forward
to having dependency isolation and to the ability to upgrade the framework
independently from an application.

Thanks,
Prateek

On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee 
wrote:

> Hi all,
>
> This is a call for a vote on SEP-24: Cluster-based Job Coordinator
> Dependency Isolation. Thanks to everyone who reviewed the proposal and
> provided feedback.
>
> I have addressed comments on the SEP, and I am not aware of any further
> major questions or objections, so I am starting this vote.
>
> SEP link:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
>
> Discuss thread:
>
> https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%3cCAMja7KeGcRZ3H95Rxk5XE=60zxm6jxjkjuwwxmgmadpfbyk...@mail.gmail.com%3e
> There was also some discussion through comments on the SEP page (see
> Resolved Comments).
>
> Please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thank you,
> Cameron
>