Re: Tracing the Samza+YARN startup process

2019-06-20 Thread Malcolm McFarland
No problem -- I'm happy that we finally figured this out and could share
our results. ECS could actually be a good choice for Node Managers; it's
easy in ECS to scale node counts up and down and to cycle out unhealthy
servers.

Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Wed, Jun 19, 2019 at 5:13 PM Yi Pan  wrote:

> Great and detailed report! Really appreciate it!
>
> -Yi
>
> On Tue, Jun 18, 2019 at 2:37 PM Malcolm McFarland 
> wrote:
>
> > Just want to follow up on this, for anybody that might be trying to do
> > something similar.
> >
> > There are two things that were getting in the way of us using YARN+Samza
> on
> > ECS: 1) YARN needs to be able to resolve its hostname to something that's
> > publicly available; and 2) Samza needs to be able to open connections on
> > arbitrary ports in the 3+ range.
> >
> > Docker confounds each of these in a different way. For the first,
> Docker's
> > hostname inside of the container is an arbitrary hash, and this is what
> > java.net.InetAddress will resolve to. I took Rayman's suggestion and used
> > dnsmasq to create a local CNAME mapping inside the container, mapping the
> > local "hostname" to one that is publicly available. This should work well
> > for any Docker-hosted JVM app relying on java.net.InetAddress.
> >
> > Docker also only allows 100 ports to be publicly exposed, and there is no
> > configuration option in Samza to specify what the range of ports will be.
> > The way we worked around this on ECS was to create an elastic network
> > interface (ENI) for each of the node manager containers. Although I can't
> > find any documentation on this, I suspect that Fargate does this by
> > default, as the whole point of that service is to bypass the restrictions
> > placed on containers running on EC2 instances. With the ENI, we no longer
> > had to explicitly expose any ports; all ports will be available if the
> > security group allows.
> >
> > As an aside, you might wonder: why not just run these on Fargate? Well,
> > Fargate only allows 10GB of storage (this can be extended a small amount
> > via an ephemeral mounted volume but seemingly not enough to satisfy
> YARN's
> > VM requirements).
> >
> > Hth, and thanks for everybody's patience,
> >
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
> >
> > On Fri, May 31, 2019 at 3:08 PM rayman preet 
> wrote:
> >
> > > Apart from /etc/hosts and /bin/hostname the only other relevant place
> > might
> > > be
> > > to modify values in /etc/resolv.conf, to point to, e.g., a dnsmasq
> > > instance.
> > >
> > > On Fri, May 31, 2019 at 2:43 PM Malcolm McFarland <
> > mmcfarl...@cavulus.com>
> > > wrote:
> > >
> > > > Hey Rayman,
> > > >
> > > > The ops group and I went through the configuration today and observed
> > the
> > > > YARN containers as they were coming up. We seem to have found the
> root
> > of
> > > > the problem, and I'm putting this out there for anybody else that's
> > > trying
> > > > to do something similar on AWS ECS:
> > > >
> > > > The ECS container instances set their hostname to the container ID on
> > > > startup (ie 717b6f75aaf8), and this looks like it's interfering with
> > the
> > > > YARN container startup process. This *seems* to be corroborated in
> that
> > > > containers that start on the same host as their AM look to be
> starting
> > > fine
> > > > (ie they can locally resolve their IP address correctly), but
> > containers
> > > > starting on other hosts don't seem to be. We were *not* having this
> > > problem
> > > > on Fargate, and my only guess is that, given Fargate's intended use
> > case
> > > as
> > > > a replicated-services-in-the-cloud environment, AWS sets the hostname
> > for
> > > > Fargate-bound Docker containers on launch (ie
> > > > ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note,
> we
> > > > probably would have stuck with Fargate and not run into this problem,
> > but
> > > > Fargate instances are only allowed 10GB of disk space, and this
> wasn't
> > > > enough for YARN's VM requirements.)
> > > >
> > > > I've been fishing around for a way to get Samza to resolve the
> hostname
> > > to
> > > > 

Re: Tracing the Samza+YARN startup process

2019-06-19 Thread Yi Pan
Great and detailed report! Really appreciate it!

-Yi

On Tue, Jun 18, 2019 at 2:37 PM Malcolm McFarland 
wrote:

> Just want to follow up on this, for anybody that might be trying to do
> something similar.
>
> There are two things that were getting in the way of us using YARN+Samza on
> ECS: 1) YARN needs to be able to resolve its hostname to something that's
> publicly available; and 2) Samza needs to be able to open connections on
> arbitrary ports in the 3+ range.
>
> Docker confounds each of these in a different way. For the first, Docker's
> hostname inside of the container is an arbitrary hash, and this is what
> java.net.InetAddress will resolve to. I took Rayman's suggestion and used
> dnsmasq to create a local CNAME mapping inside the container, mapping the
> local "hostname" to one that is publicly available. This should work well
> for any Docker-hosted JVM app relying on java.net.InetAddress.
>
> Docker also only allows 100 ports to be publicly exposed, and there is no
> configuration option in Samza to specify what the range of ports will be.
> The way we worked around this on ECS was to create an elastic network
> interface (ENI) for each of the node manager containers. Although I can't
> find any documentation on this, I suspect that Fargate does this by
> default, as the whole point of that service is to bypass the restrictions
> placed on containers running on EC2 instances. With the ENI, we no longer
> had to explicitly expose any ports; all ports will be available if the
> security group allows.
>
> As an aside, you might wonder: why not just run these on Fargate? Well,
> Fargate only allows 10GB of storage (this can be extended a small amount
> via an ephemeral mounted volume but seemingly not enough to satisfy YARN's
> VM requirements).
>
> Hth, and thanks for everybody's patience,
>
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
>
> On Fri, May 31, 2019 at 3:08 PM rayman preet  wrote:
>
> > Apart from /etc/hosts and /bin/hostname the only other relevant place
> might
> > be
> > to modify values in /etc/resolv.conf, to point to, e.g., a dnsmasq
> > instance.
> >
> > On Fri, May 31, 2019 at 2:43 PM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > wrote:
> >
> > > Hey Rayman,
> > >
> > > The ops group and I went through the configuration today and observed
> the
> > > YARN containers as they were coming up. We seem to have found the root
> of
> > > the problem, and I'm putting this out there for anybody else that's
> > trying
> > > to do something similar on AWS ECS:
> > >
> > > The ECS container instances set their hostname to the container ID on
> > > startup (ie 717b6f75aaf8), and this looks like it's interfering with
> the
> > > YARN container startup process. This *seems* to be corroborated in that
> > > containers that start on the same host as their AM look to be starting
> > fine
> > > (ie they can locally resolve their IP address correctly), but
> containers
> > > starting on other hosts don't seem to be. We were *not* having this
> > problem
> > > on Fargate, and my only guess is that, given Fargate's intended use
> case
> > as
> > > a replicated-services-in-the-cloud environment, AWS sets the hostname
> for
> > > Fargate-bound Docker containers on launch (ie
> > > ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note, we
> > > probably would have stuck with Fargate and not run into this problem,
> but
> > > Fargate instances are only allowed 10GB of disk space, and this wasn't
> > > enough for YARN's VM requirements.)
> > >
> > > I've been fishing around for a way to get Samza to resolve the hostname
> > to
> > > something more publicly-available. I've thus far tried a) changing the
> > > /etc/hosts file, and b) replacing the /bin/hostname binary in the
> > container
> > > with a static script, but neither of these options seem to have an
> effect
> > > on Java's DNS resolution. Two further options I can think of are:
> > >
> > > - find some place in the Samza configuration where the hostname can be
> > set
> > > explicitly; or
> > > - change just the right piece of information in the system so that
> > > java.net.InetAddress will resolve the localhost to something other than
> > > what's returned from /bin/hostname (I'm guessing it uses gethostname()
> on
> > > Ubuntu, could be wrong).
> > >
> > > Anybody ideas?
> > >
> > > Cheers,
> > > Malcolm McFarland
> > > Cavulus
> > >
> > >
> > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > unauthorized or improper disclosure, copying, distribution, or use of
> the
> > 

Re: Tracing the Samza+YARN startup process

2019-06-18 Thread Malcolm McFarland
Just want to follow up on this, for anybody that might be trying to do
something similar.

There are two things that were getting in the way of us using YARN+Samza on
ECS: 1) YARN needs to be able to resolve its hostname to something that's
publicly available; and 2) Samza needs to be able to open connections on
arbitrary ports in the 3+ range.

Docker confounds each of these in a different way. For the first, Docker's
hostname inside of the container is an arbitrary hash, and this is what
java.net.InetAddress will resolve to. I took Rayman's suggestion and used
dnsmasq to create a local CNAME mapping inside the container, mapping the
local "hostname" to one that is publicly available. This should work well
for any Docker-hosted JVM app relying on java.net.InetAddress.

Docker also only allows 100 ports to be publicly exposed, and there is no
configuration option in Samza to specify what the range of ports will be.
The way we worked around this on ECS was to create an elastic network
interface (ENI) for each of the node manager containers. Although I can't
find any documentation on this, I suspect that Fargate does this by
default, as the whole point of that service is to bypass the restrictions
placed on containers running on EC2 instances. With the ENI, we no longer
had to explicitly expose any ports; all ports will be available if the
security group allows.

As an aside, you might wonder: why not just run these on Fargate? Well,
Fargate only allows 10GB of storage (this can be extended a small amount
via an ephemeral mounted volume but seemingly not enough to satisfy YARN's
VM requirements).

Hth, and thanks for everybody's patience,

Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Fri, May 31, 2019 at 3:08 PM rayman preet  wrote:

> Apart from /etc/hosts and /bin/hostname the only other relevant place might
> be
> to modify values in /etc/resolv.conf, to point to, e.g., a dnsmasq
> instance.
>
> On Fri, May 31, 2019 at 2:43 PM Malcolm McFarland 
> wrote:
>
> > Hey Rayman,
> >
> > The ops group and I went through the configuration today and observed the
> > YARN containers as they were coming up. We seem to have found the root of
> > the problem, and I'm putting this out there for anybody else that's
> trying
> > to do something similar on AWS ECS:
> >
> > The ECS container instances set their hostname to the container ID on
> > startup (ie 717b6f75aaf8), and this looks like it's interfering with the
> > YARN container startup process. This *seems* to be corroborated in that
> > containers that start on the same host as their AM look to be starting
> fine
> > (ie they can locally resolve their IP address correctly), but containers
> > starting on other hosts don't seem to be. We were *not* having this
> problem
> > on Fargate, and my only guess is that, given Fargate's intended use case
> as
> > a replicated-services-in-the-cloud environment, AWS sets the hostname for
> > Fargate-bound Docker containers on launch (ie
> > ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note, we
> > probably would have stuck with Fargate and not run into this problem, but
> > Fargate instances are only allowed 10GB of disk space, and this wasn't
> > enough for YARN's VM requirements.)
> >
> > I've been fishing around for a way to get Samza to resolve the hostname
> to
> > something more publicly-available. I've thus far tried a) changing the
> > /etc/hosts file, and b) replacing the /bin/hostname binary in the
> container
> > with a static script, but neither of these options seem to have an effect
> > on Java's DNS resolution. Two further options I can think of are:
> >
> > - find some place in the Samza configuration where the hostname can be
> set
> > explicitly; or
> > - change just the right piece of information in the system so that
> > java.net.InetAddress will resolve the localhost to something other than
> > what's returned from /bin/hostname (I'm guessing it uses gethostname() on
> > Ubuntu, could be wrong).
> >
> > Anybody ideas?
> >
> > Cheers,
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This 

Re: Tracing the Samza+YARN startup process

2019-05-31 Thread rayman preet
Apart from /etc/hosts and /bin/hostname the only other relevant place might
be
to modify values in /etc/resolv.conf, to point to, e.g., a dnsmasq instance.

On Fri, May 31, 2019 at 2:43 PM Malcolm McFarland 
wrote:

> Hey Rayman,
>
> The ops group and I went through the configuration today and observed the
> YARN containers as they were coming up. We seem to have found the root of
> the problem, and I'm putting this out there for anybody else that's trying
> to do something similar on AWS ECS:
>
> The ECS container instances set their hostname to the container ID on
> startup (ie 717b6f75aaf8), and this looks like it's interfering with the
> YARN container startup process. This *seems* to be corroborated in that
> containers that start on the same host as their AM look to be starting fine
> (ie they can locally resolve their IP address correctly), but containers
> starting on other hosts don't seem to be. We were *not* having this problem
> on Fargate, and my only guess is that, given Fargate's intended use case as
> a replicated-services-in-the-cloud environment, AWS sets the hostname for
> Fargate-bound Docker containers on launch (ie
> ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note, we
> probably would have stuck with Fargate and not run into this problem, but
> Fargate instances are only allowed 10GB of disk space, and this wasn't
> enough for YARN's VM requirements.)
>
> I've been fishing around for a way to get Samza to resolve the hostname to
> something more publicly-available. I've thus far tried a) changing the
> /etc/hosts file, and b) replacing the /bin/hostname binary in the container
> with a static script, but neither of these options seem to have an effect
> on Java's DNS resolution. Two further options I can think of are:
>
> - find some place in the Samza configuration where the hostname can be set
> explicitly; or
> - change just the right piece of information in the system so that
> java.net.InetAddress will resolve the localhost to something other than
> what's returned from /bin/hostname (I'm guessing it uses gethostname() on
> Ubuntu, could be wrong).
>
> Anybody ideas?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
>
> On Fri, May 31, 2019 at 9:27 AM rayman preet  wrote:
>
> > Yes I think your hunch is right. Each container queries the AM over HTTP
> to
> > obtain
> > the jobModel that it is supposed to run. The AM runs a HTTP server
> usually
> > on
> > a dynamically allocated free port on the machine it's running on.
> > So its possible that a firewall rule blocks the container when it tries
> to
> > reach this port
> > on the AM's machine?
> >
> > --
> > thanks
> > rayman
> >
> > On Thu, May 30, 2019 at 5:30 PM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > wrote:
> >
> > > Thanks for the image, appreciate you taking the effort to do that! I'm
> > > still hitting this wall. The AM will launch the container, the
> container
> > > will go from "accepted" to "running", but there will be no output from
> > the
> > > container (I'm piping all of the Samza, org.apache, org.kafka, and our
> > own
> > > application's logging output to a Kafka topic). During these periods,
> the
> > > container will hang out at ~100MB/8GB memory usage and stall. There's
> no
> > > error output when this happens; it just kind of stops. My suspicion is
> > that
> > > our Ops group has a firewall rule up that's interfering with this,or
> > maybe
> > > just isn't white-listing a port correctly, and if I could identify
> where
> > > the application is stalling, it'd probably help to narrow down the
> > > possibilities.
> > >
> > > Cheers,
> > > Malcolm McFarland
> > > Cavulus
> > >
> > >
> > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > unauthorized or improper disclosure, copying, distribution, or use of
> the
> > > contents of this message is prohibited. The information contained in
> this
> > > message is intended only for the personal and confidential use of the
> > > recipient(s) named above. If you have received this message in error,
> > > please notify the 

Re: Tracing the Samza+YARN startup process

2019-05-31 Thread Malcolm McFarland
Hey Rayman,

The ops group and I went through the configuration today and observed the
YARN containers as they were coming up. We seem to have found the root of
the problem, and I'm putting this out there for anybody else that's trying
to do something similar on AWS ECS:

The ECS container instances set their hostname to the container ID on
startup (ie 717b6f75aaf8), and this looks like it's interfering with the
YARN container startup process. This *seems* to be corroborated in that
containers that start on the same host as their AM look to be starting fine
(ie they can locally resolve their IP address correctly), but containers
starting on other hosts don't seem to be. We were *not* having this problem
on Fargate, and my only guess is that, given Fargate's intended use case as
a replicated-services-in-the-cloud environment, AWS sets the hostname for
Fargate-bound Docker containers on launch (ie
ip-10-#-#-#.us-west-#.internal.local or whatever). (As a side note, we
probably would have stuck with Fargate and not run into this problem, but
Fargate instances are only allowed 10GB of disk space, and this wasn't
enough for YARN's VM requirements.)

I've been fishing around for a way to get Samza to resolve the hostname to
something more publicly-available. I've thus far tried a) changing the
/etc/hosts file, and b) replacing the /bin/hostname binary in the container
with a static script, but neither of these options seem to have an effect
on Java's DNS resolution. Two further options I can think of are:

- find some place in the Samza configuration where the hostname can be set
explicitly; or
- change just the right piece of information in the system so that
java.net.InetAddress will resolve the localhost to something other than
what's returned from /bin/hostname (I'm guessing it uses gethostname() on
Ubuntu, could be wrong).

Anybody ideas?

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.

Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Fri, May 31, 2019 at 9:27 AM rayman preet  wrote:

> Yes I think your hunch is right. Each container queries the AM over HTTP to
> obtain
> the jobModel that it is supposed to run. The AM runs a HTTP server usually
> on
> a dynamically allocated free port on the machine it's running on.
> So its possible that a firewall rule blocks the container when it tries to
> reach this port
> on the AM's machine?
>
> --
> thanks
> rayman
>
> On Thu, May 30, 2019 at 5:30 PM Malcolm McFarland 
> wrote:
>
> > Thanks for the image, appreciate you taking the effort to do that! I'm
> > still hitting this wall. The AM will launch the container, the container
> > will go from "accepted" to "running", but there will be no output from
> the
> > container (I'm piping all of the Samza, org.apache, org.kafka, and our
> own
> > application's logging output to a Kafka topic). During these periods, the
> > container will hang out at ~100MB/8GB memory usage and stall. There's no
> > error output when this happens; it just kind of stops. My suspicion is
> that
> > our Ops group has a firewall rule up that's interfering with this,or
> maybe
> > just isn't white-listing a port correctly, and if I could identify where
> > the application is stalling, it'd probably help to narrow down the
> > possibilities.
> >
> > Cheers,
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
> >
> > On Thu, May 30, 2019 at 1:39 PM rayman preet 
> wrote:
> >
> > > I uploaded the image here:
> > > https://www.dropbox.com/s/rv57v165ysp12c5/samza%20flow.png?dl=0
> > >
> > > Are you still running into this issue?
> > > Is there anything in the container's log that shows any
> > exceptions/errors.
> > >
> > > On Wed, May 22, 2019 at 10:15 PM Malcolm McFarland <
> > mmcfarl...@cavulus.com
> > 

Re: Tracing the Samza+YARN startup process

2019-05-31 Thread rayman preet
Yes I think your hunch is right. Each container queries the AM over HTTP to
obtain
the jobModel that it is supposed to run. The AM runs a HTTP server usually
on
a dynamically allocated free port on the machine it's running on.
So its possible that a firewall rule blocks the container when it tries to
reach this port
on the AM's machine?

--
thanks
rayman

On Thu, May 30, 2019 at 5:30 PM Malcolm McFarland 
wrote:

> Thanks for the image, appreciate you taking the effort to do that! I'm
> still hitting this wall. The AM will launch the container, the container
> will go from "accepted" to "running", but there will be no output from the
> container (I'm piping all of the Samza, org.apache, org.kafka, and our own
> application's logging output to a Kafka topic). During these periods, the
> container will hang out at ~100MB/8GB memory usage and stall. There's no
> error output when this happens; it just kind of stops. My suspicion is that
> our Ops group has a firewall rule up that's interfering with this,or maybe
> just isn't white-listing a port correctly, and if I could identify where
> the application is stalling, it'd probably help to narrow down the
> possibilities.
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>
>
> On Thu, May 30, 2019 at 1:39 PM rayman preet  wrote:
>
> > I uploaded the image here:
> > https://www.dropbox.com/s/rv57v165ysp12c5/samza%20flow.png?dl=0
> >
> > Are you still running into this issue?
> > Is there anything in the container's log that shows any
> exceptions/errors.
> >
> > On Wed, May 22, 2019 at 10:15 PM Malcolm McFarland <
> mmcfarl...@cavulus.com
> > >
> > wrote:
> >
> > > Hey rayman,
> > >
> > > What it looks like is that the AM has started, the container has
> started,
> > > but, ie, here will be the last messages I see in the Samza logs:
> > >
> > > 2019-05-23T05:10:45.048ZINFOMaking a request for ANY_HOST
> > > 2019-05-23T05:10:45.057ZINFOStarting the container
> allocator
> > > thread
> > > 2019-05-23T05:10:47.098ZINFOReceived new token for :
> > > :8032
> > > 2019-05-23T05:10:47.102ZINFOContainer allocated from RM on
> > > 
> > > 2019-05-23T05:10:47.105ZINFOContainer allocated from RM on
> > > 
> > >
> > > At this point, it seems to stall, and no more output is produced.
> > >
> > > Also, I couldn't see you diagram (it's possible my company's email
> > filters
> > > attachments); can I see that on the web anywhere?
> > >
> > > Cheers,
> > > Malcolm
> > >
> > > On Wed, May 22, 2019 at 4:30 PM rayman preet 
> > wrote:
> > >
> > > > Hi Malcolm,
> > > >
> > > > This figure (attached) gives an overview of the flow. Is
> > > > this something you were looking for?
> > > >
> > > > Also, by "don't fully start up" do you mean that
> > > > applications are missing some containers (but the ApplicationMaster
> is
> > > > running)?
> > > > Or the application is missing entirely.
> > > >
> > > > --
> > > > thanks
> > > > rayman
> > > > [image: Samza Job Launch Sequence.png]
> > > >
> > > > On Tue, May 21, 2019 at 3:58 PM Malcolm McFarland <
> > > mmcfarl...@cavulus.com>
> > > > wrote:
> > > >
> > > >> Hey Folks,
> > > >>
> > > >> I'm still trying to pin down why these applications are sometimes
> not
> > > >> starting. Everything looks fine in the YARN web UI and in the
> > > >> immediately available logs, but the applications don't always fully
> > > >> start up. Does anybody have a rundown about how to trace the Samza
> > > >> startup process on a YARN cluster, from Accepted status, to
> > > >> localization, to the application master startup, to the actual
> > > >> application's startup?
> > > >>
> > > >> Cheers,
> > > >> Malcolm
> > > >>
> > > >> --
> > > >> Malcolm McFarland
> > > >> Cavulus
> > > >>
> > > >>
> > > >> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > >> unauthorized or improper disclosure, copying, distribution, or use
> of
> > > >> the contents of this message is prohibited. The information
> contained
> > > >> in this message is intended only for the personal and confidential
> use
> > > >> of the recipient(s) named above. If you have received this message
> in
> > > >> error, please notify the sender immediately and delete the original
> > > >> message.
> > > >>
> > > >
> > > >
> > > > --
> > > > thanks
> > > > rayman
> > > >
> > >
> > >
> > > --
> > > Malcolm McFarland
> > > Cavulus
> > > 1-800-760-6915
> > > mmcfarl...@cavulus.com
> > >
> > >
> > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > > unauthorized or improper 

Re: Tracing the Samza+YARN startup process

2019-05-30 Thread Malcolm McFarland
Thanks for the image, appreciate you taking the effort to do that! I'm
still hitting this wall. The AM will launch the container, the container
will go from "accepted" to "running", but there will be no output from the
container (I'm piping all of the Samza, org.apache, org.kafka, and our own
application's logging output to a Kafka topic). During these periods, the
container will hang out at ~100MB/8GB memory usage and stall. There's no
error output when this happens; it just kind of stops. My suspicion is that
our Ops group has a firewall rule up that's interfering with this,or maybe
just isn't white-listing a port correctly, and if I could identify where
the application is stalling, it'd probably help to narrow down the
possibilities.

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Thu, May 30, 2019 at 1:39 PM rayman preet  wrote:

> I uploaded the image here:
> https://www.dropbox.com/s/rv57v165ysp12c5/samza%20flow.png?dl=0
>
> Are you still running into this issue?
> Is there anything in the container's log that shows any exceptions/errors.
>
> On Wed, May 22, 2019 at 10:15 PM Malcolm McFarland  >
> wrote:
>
> > Hey rayman,
> >
> > What it looks like is that the AM has started, the container has started,
> > but, ie, here will be the last messages I see in the Samza logs:
> >
> > 2019-05-23T05:10:45.048ZINFOMaking a request for ANY_HOST
> > 2019-05-23T05:10:45.057ZINFOStarting the container allocator
> > thread
> > 2019-05-23T05:10:47.098ZINFOReceived new token for :
> > :8032
> > 2019-05-23T05:10:47.102ZINFOContainer allocated from RM on
> > 
> > 2019-05-23T05:10:47.105ZINFOContainer allocated from RM on
> > 
> >
> > At this point, it seems to stall, and no more output is produced.
> >
> > Also, I couldn't see you diagram (it's possible my company's email
> filters
> > attachments); can I see that on the web anywhere?
> >
> > Cheers,
> > Malcolm
> >
> > On Wed, May 22, 2019 at 4:30 PM rayman preet 
> wrote:
> >
> > > Hi Malcolm,
> > >
> > > This figure (attached) gives an overview of the flow. Is
> > > this something you were looking for?
> > >
> > > Also, by "don't fully start up" do you mean that
> > > applications are missing some containers (but the ApplicationMaster is
> > > running)?
> > > Or the application is missing entirely.
> > >
> > > --
> > > thanks
> > > rayman
> > > [image: Samza Job Launch Sequence.png]
> > >
> > > On Tue, May 21, 2019 at 3:58 PM Malcolm McFarland <
> > mmcfarl...@cavulus.com>
> > > wrote:
> > >
> > >> Hey Folks,
> > >>
> > >> I'm still trying to pin down why these applications are sometimes not
> > >> starting. Everything looks fine in the YARN web UI and in the
> > >> immediately available logs, but the applications don't always fully
> > >> start up. Does anybody have a rundown about how to trace the Samza
> > >> startup process on a YARN cluster, from Accepted status, to
> > >> localization, to the application master startup, to the actual
> > >> application's startup?
> > >>
> > >> Cheers,
> > >> Malcolm
> > >>
> > >> --
> > >> Malcolm McFarland
> > >> Cavulus
> > >>
> > >>
> > >> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > >> unauthorized or improper disclosure, copying, distribution, or use of
> > >> the contents of this message is prohibited. The information contained
> > >> in this message is intended only for the personal and confidential use
> > >> of the recipient(s) named above. If you have received this message in
> > >> error, please notify the sender immediately and delete the original
> > >> message.
> > >>
> > >
> > >
> > > --
> > > thanks
> > > rayman
> > >
> >
> >
> > --
> > Malcolm McFarland
> > Cavulus
> > 1-800-760-6915
> > mmcfarl...@cavulus.com
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
>
>
> --
> thanks
> rayman
>


Re: Tracing the Samza+YARN startup process

2019-05-30 Thread rayman preet
I uploaded the image here:
https://www.dropbox.com/s/rv57v165ysp12c5/samza%20flow.png?dl=0

Are you still running into this issue?
Is there anything in the container's log that shows any exceptions/errors.

On Wed, May 22, 2019 at 10:15 PM Malcolm McFarland 
wrote:

> Hey rayman,
>
> What it looks like is that the AM has started, the container has started,
> but, ie, here will be the last messages I see in the Samza logs:
>
> 2019-05-23T05:10:45.048ZINFOMaking a request for ANY_HOST
> 2019-05-23T05:10:45.057ZINFOStarting the container allocator
> thread
> 2019-05-23T05:10:47.098ZINFOReceived new token for :
> :8032
> 2019-05-23T05:10:47.102ZINFOContainer allocated from RM on
> 
> 2019-05-23T05:10:47.105ZINFOContainer allocated from RM on
> 
>
> At this point, it seems to stall, and no more output is produced.
>
> Also, I couldn't see you diagram (it's possible my company's email filters
> attachments); can I see that on the web anywhere?
>
> Cheers,
> Malcolm
>
> On Wed, May 22, 2019 at 4:30 PM rayman preet  wrote:
>
> > Hi Malcolm,
> >
> > This figure (attached) gives an overview of the flow. Is
> > this something you were looking for?
> >
> > Also, by "don't fully start up" do you mean that
> > applications are missing some containers (but the ApplicationMaster is
> > running)?
> > Or the application is missing entirely.
> >
> > --
> > thanks
> > rayman
> > [image: Samza Job Launch Sequence.png]
> >
> > On Tue, May 21, 2019 at 3:58 PM Malcolm McFarland <
> mmcfarl...@cavulus.com>
> > wrote:
> >
> >> Hey Folks,
> >>
> >> I'm still trying to pin down why these applications are sometimes not
> >> starting. Everything looks fine in the YARN web UI and in the
> >> immediately available logs, but the applications don't always fully
> >> start up. Does anybody have a rundown about how to trace the Samza
> >> startup process on a YARN cluster, from Accepted status, to
> >> localization, to the application master startup, to the actual
> >> application's startup?
> >>
> >> Cheers,
> >> Malcolm
> >>
> >> --
> >> Malcolm McFarland
> >> Cavulus
> >>
> >>
> >> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> >> unauthorized or improper disclosure, copying, distribution, or use of
> >> the contents of this message is prohibited. The information contained
> >> in this message is intended only for the personal and confidential use
> >> of the recipient(s) named above. If you have received this message in
> >> error, please notify the sender immediately and delete the original
> >> message.
> >>
> >
> >
> > --
> > thanks
> > rayman
> >
>
>
> --
> Malcolm McFarland
> Cavulus
> 1-800-760-6915
> mmcfarl...@cavulus.com
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>


-- 
thanks
rayman


Re: Tracing the Samza+YARN startup process

2019-05-22 Thread Malcolm McFarland
Hey rayman,

What it looks like is that the AM has started, the container has started,
but, ie, here will be the last messages I see in the Samza logs:

2019-05-23T05:10:45.048ZINFOMaking a request for ANY_HOST
2019-05-23T05:10:45.057ZINFOStarting the container allocator
thread
2019-05-23T05:10:47.098ZINFOReceived new token for :
:8032
2019-05-23T05:10:47.102ZINFOContainer allocated from RM on

2019-05-23T05:10:47.105ZINFOContainer allocated from RM on


At this point, it seems to stall, and no more output is produced.

Also, I couldn't see you diagram (it's possible my company's email filters
attachments); can I see that on the web anywhere?

Cheers,
Malcolm

On Wed, May 22, 2019 at 4:30 PM rayman preet  wrote:

> Hi Malcolm,
>
> This figure (attached) gives an overview of the flow. Is
> this something you were looking for?
>
> Also, by "don't fully start up" do you mean that
> applications are missing some containers (but the ApplicationMaster is
> running)?
> Or the application is missing entirely.
>
> --
> thanks
> rayman
> [image: Samza Job Launch Sequence.png]
>
> On Tue, May 21, 2019 at 3:58 PM Malcolm McFarland 
> wrote:
>
>> Hey Folks,
>>
>> I'm still trying to pin down why these applications are sometimes not
>> starting. Everything looks fine in the YARN web UI and in the
>> immediately available logs, but the applications don't always fully
>> start up. Does anybody have a rundown about how to trace the Samza
>> startup process on a YARN cluster, from Accepted status, to
>> localization, to the application master startup, to the actual
>> application's startup?
>>
>> Cheers,
>> Malcolm
>>
>> --
>> Malcolm McFarland
>> Cavulus
>>
>>
>> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
>> unauthorized or improper disclosure, copying, distribution, or use of
>> the contents of this message is prohibited. The information contained
>> in this message is intended only for the personal and confidential use
>> of the recipient(s) named above. If you have received this message in
>> error, please notify the sender immediately and delete the original
>> message.
>>
>
>
> --
> thanks
> rayman
>


-- 
Malcolm McFarland
Cavulus
1-800-760-6915
mmcfarl...@cavulus.com


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


Re: Tracing the Samza+YARN startup process

2019-05-22 Thread rayman preet
Hi Malcolm,

This figure (attached) gives an overview of the flow. Is
this something you were looking for?

Also, by "don't fully start up" do you mean that
applications are missing some containers (but the ApplicationMaster is
running)?
Or the application is missing entirely.

--
thanks
rayman
[image: Samza Job Launch Sequence.png]

On Tue, May 21, 2019 at 3:58 PM Malcolm McFarland 
wrote:

> Hey Folks,
>
> I'm still trying to pin down why these applications are sometimes not
> starting. Everything looks fine in the YARN web UI and in the
> immediately available logs, but the applications don't always fully
> start up. Does anybody have a rundown about how to trace the Samza
> startup process on a YARN cluster, from Accepted status, to
> localization, to the application master startup, to the actual
> application's startup?
>
> Cheers,
> Malcolm
>
> --
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of
> the contents of this message is prohibited. The information contained
> in this message is intended only for the personal and confidential use
> of the recipient(s) named above. If you have received this message in
> error, please notify the sender immediately and delete the original
> message.
>


-- 
thanks
rayman


Tracing the Samza+YARN startup process

2019-05-21 Thread Malcolm McFarland
Hey Folks,

I'm still trying to pin down why these applications are sometimes not
starting. Everything looks fine in the YARN web UI and in the
immediately available logs, but the applications don't always fully
start up. Does anybody have a rundown about how to trace the Samza
startup process on a YARN cluster, from Accepted status, to
localization, to the application master startup, to the actual
application's startup?

Cheers,
Malcolm

-- 
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of
the contents of this message is prohibited. The information contained
in this message is intended only for the personal and confidential use
of the recipient(s) named above. If you have received this message in
error, please notify the sender immediately and delete the original
message.