Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Otto Fowler
 I think the difference is the maintenance of the core of metron that *has*
to be, and other things that may still be done, but will be worked on for
their merits or by community need and not be required for everything

On April 21, 2020 at 10:29:24, Justin Leet (justinjl...@gmail.com) wrote:

How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level. This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope. It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen  wrote:

> Hi Tom -
>
> > Do you or anyone have enough experience to judge if it is possible to
> leverage Ansible as a replacement to deploy a working cluster?
>
> Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
> This was the primary deployment mechanism before we had the Ambari MPack.
>
> We found it very difficult to use Ansible to create a one-size-fits-all
> deployment solution. It's possible, but very difficult to get a solution
> that doesn't take close monitoring and manual work arounds when
attempting
> to use it across environments of different sizes and shapes. In terms of
> usability, the Ambari MPack was a big step-up in my opinion.
>
>
> > perhaps a dedicated docker image that is designed to connect with other
> dockerized applications such as Storm, Kafka, etc..?
>
> Yes, I think that would be the way to go for a dev environment. We would
be
> able to use community supported containers for most of our underlying
> platform needs. Unfortunately, this alone would not help anyone deploy
> Metron on a cluster.
>
>
>
>
> On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:
>
> > Hi Nick,
> >
> > I see there is a lot of work done using Ansible in the repository. Do
you
> > or anyone have enough experience to judge if it is possible to leverage
> > Ansible as a replacement to deploy a working cluster?
> >
> > Now that I am typing this out, I wonder if docker might be a solution
> that
> > would work? I don't have much experience with docker, perhaps a
dedicated
> > docker image that is designed to connect with other dockerized
> applications
> > such as Storm, Kafka, etc..?
> >
> > --Tom.
> >
> > On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
> >
> > This is a good discussion and one that I haven't fully grappled with
> > in my
> > own mind yet. I'll have more to add, but I just want to chime in on
> the
> > topic of Ambari at this point.
> >
> > ### Ambari and the Paywall
> >
> > The problem with Ambari is that its installation mechanism requires a
> > repository of compiled packages (RPMs, DEBs, etc.) To install the
> > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc)
> we
> > relied on binary packages that were made freely available by
> > Cloudera/Hortonworks. As of this past January, those packages are now
> > behind a paywall.
> >
> > Due to the paywall, installing your own HDP cluster with Ambari is
> now
> > effectively dead. I am not sure if legacy versions of Kafka, HBase,
> > Storm,
> > etc will continue to be freely available, but even if so, we cannot
> > continue to rely on this mechanism if new versions and security
> updates
> > will not be made available.
> >
> > The Apache Metron project does not publish compiled binaries or
> > packages
> > either. We do make the code freely available to allow users to build
> > and
> > publish their own Metron packages. But even with this capability,
> > unless
> > you have a means to install the underlying platform dependencies via
> > Ambari, installing Metron with Ambari has little value.
> >
> > Unfortunately, I don't see a feasible path forward for Metron's
> Ambari
> > MPack.
> >
> > ### Dev Environment
> >
> > This not only impacts the users of Apache Metron, this impacts
> > contributors
> > also. Our primary development environment relies on that Ambari
> > MPack. To
> > continue development on any of the components of Apache Metron, we
> > would
> > need to build an alternative development environment that can
> function
> > despite the paywall. That could take many shapes, but in my opinion
> it
> > would be a blocker for continuing any development on Apache Metron,
> > unfortunately.
> >
> > Please do let me know if anyone disagrees or can think of an
> > alternative
> > approach that would allow the current Ambari MPack to remain viable.
> >
> >
> >
> >
> >
> >
> >

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Justin Leet
How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level.  This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope.  It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen  wrote:

> Hi Tom -
>
> >  Do you or anyone have enough experience to judge if it is possible to
> leverage Ansible as a replacement to deploy a working cluster?
>
> Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
> This was the primary deployment mechanism before we had the Ambari MPack.
>
> We found it very difficult to use Ansible to create a one-size-fits-all
> deployment solution. It's possible, but very difficult to get a solution
> that doesn't take close monitoring and manual work arounds when attempting
> to use it across environments of different sizes and shapes. In terms of
> usability, the Ambari MPack was a big step-up in my opinion.
>
>
> >  perhaps a dedicated docker image that is designed to connect with other
> dockerized applications such as Storm, Kafka, etc..?
>
> Yes, I think that would be the way to go for a dev environment. We would be
> able to use community supported containers for most of our underlying
> platform needs. Unfortunately, this alone would not help anyone deploy
> Metron on a cluster.
>
>
>
>
> On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:
>
> > Hi Nick,
> >
> > I see there is a lot of work done using Ansible in the repository. Do you
> > or anyone have enough experience to judge if it is possible to leverage
> > Ansible as a replacement to deploy a working cluster?
> >
> > Now that I am typing this out, I wonder if docker might be a solution
> that
> > would work? I don't have much experience with docker, perhaps a dedicated
> > docker image that is designed to connect with other dockerized
> applications
> > such as Storm, Kafka, etc..?
> >
> > --Tom.
> >
> > On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
> >
> > This is a good discussion and one that I haven't fully grappled with
> > in my
> > own mind yet. I'll have more to add, but I just want to chime in on
> the
> > topic of Ambari at this point.
> >
> > ### Ambari and the Paywall
> >
> > The problem with Ambari is that its installation mechanism requires a
> > repository of compiled packages (RPMs, DEBs, etc.) To install the
> > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc)
> we
> > relied on binary packages that were made freely available by
> > Cloudera/Hortonworks. As of this past January, those packages are now
> > behind a paywall.
> >
> > Due to the paywall, installing your own HDP cluster with Ambari is
> now
> > effectively dead.  I am not sure if legacy versions of Kafka, HBase,
> > Storm,
> > etc will continue to be freely available, but even if so, we cannot
> > continue to rely on this mechanism if new versions and security
> updates
> > will not be made available.
> >
> > The Apache Metron project does not publish compiled binaries or
> > packages
> > either.  We do make the code freely available to allow users to build
> > and
> > publish their own Metron packages.   But even with this capability,
> > unless
> > you have a means to install the underlying platform dependencies via
> > Ambari, installing Metron with Ambari has little value.
> >
> > Unfortunately, I don't see a feasible path forward for Metron's
> Ambari
> > MPack.
> >
> > ### Dev Environment
> >
> > This not only impacts the users of Apache Metron, this impacts
> > contributors
> > also. Our primary development environment relies on that Ambari
> > MPack.  To
> > continue development on any of the components of Apache Metron, we
> > would
> > need to build an alternative development environment that can
> function
> > despite the paywall.  That could take many shapes, but in my opinion
> it
> > would be a blocker for continuing any development on Apache Metron,
> > unfortunately.
> >
> > Please do let me know if anyone disagrees or can think of an
> > alternative
> > approach that would allow the current Ambari MPack to remain viable.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov 
> > wrote:
> >
> > >   - Dropping Ambari.
> > >
> > > 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Nick Allen
Hi Tom -

>  Do you or anyone have enough experience to judge if it is possible to
leverage Ansible as a replacement to deploy a working cluster?

Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
This was the primary deployment mechanism before we had the Ambari MPack.

We found it very difficult to use Ansible to create a one-size-fits-all
deployment solution. It's possible, but very difficult to get a solution
that doesn't take close monitoring and manual work arounds when attempting
to use it across environments of different sizes and shapes. In terms of
usability, the Ambari MPack was a big step-up in my opinion.


>  perhaps a dedicated docker image that is designed to connect with other
dockerized applications such as Storm, Kafka, etc..?

Yes, I think that would be the way to go for a dev environment. We would be
able to use community supported containers for most of our underlying
platform needs. Unfortunately, this alone would not help anyone deploy
Metron on a cluster.




On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:

> Hi Nick,
>
> I see there is a lot of work done using Ansible in the repository. Do you
> or anyone have enough experience to judge if it is possible to leverage
> Ansible as a replacement to deploy a working cluster?
>
> Now that I am typing this out, I wonder if docker might be a solution that
> would work? I don't have much experience with docker, perhaps a dedicated
> docker image that is designed to connect with other dockerized applications
> such as Storm, Kafka, etc..?
>
> --Tom.
>
> On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
>
> This is a good discussion and one that I haven't fully grappled with
> in my
> own mind yet. I'll have more to add, but I just want to chime in on the
> topic of Ambari at this point.
>
> ### Ambari and the Paywall
>
> The problem with Ambari is that its installation mechanism requires a
> repository of compiled packages (RPMs, DEBs, etc.) To install the
> underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
> relied on binary packages that were made freely available by
> Cloudera/Hortonworks. As of this past January, those packages are now
> behind a paywall.
>
> Due to the paywall, installing your own HDP cluster with Ambari is now
> effectively dead.  I am not sure if legacy versions of Kafka, HBase,
> Storm,
> etc will continue to be freely available, but even if so, we cannot
> continue to rely on this mechanism if new versions and security updates
> will not be made available.
>
> The Apache Metron project does not publish compiled binaries or
> packages
> either.  We do make the code freely available to allow users to build
> and
> publish their own Metron packages.   But even with this capability,
> unless
> you have a means to install the underlying platform dependencies via
> Ambari, installing Metron with Ambari has little value.
>
> Unfortunately, I don't see a feasible path forward for Metron's Ambari
> MPack.
>
> ### Dev Environment
>
> This not only impacts the users of Apache Metron, this impacts
> contributors
> also. Our primary development environment relies on that Ambari
> MPack.  To
> continue development on any of the components of Apache Metron, we
> would
> need to build an alternative development environment that can function
> despite the paywall.  That could take many shapes, but in my opinion it
> would be a blocker for continuing any development on Apache Metron,
> unfortunately.
>
> Please do let me know if anyone disagrees or can think of an
> alternative
> approach that would allow the current Ambari MPack to remain viable.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov 
> wrote:
>
> >   - Dropping Ambari.
> >
> > I like the progress that Apache did with Ambari in 2.7. And I don't
> know a
> > better installer/manager for all the services (we use other Hadoop
> eco
> > services besides Metron).
> >
> > Sometimes its buggy, agents get stuck or server needs reboot from
> time to
> > time, mpacks brake some functionality. But overall I feel this is the
> > direction for central management and orchestration.
> >
> > - Dima
> >
> > On Wed, Apr 15, 2020, 12:45 Justin Leet 
> wrote:
> >
> > > This is a bit off the top of my head, but I'd I agree with pretty
> much
> > all
> > > of points on what's bringing a lot of overhead.  There's probably
> also a
> > > worthwhile discussion about what value we're shooting for the
> project to
> > > provide to people that influences what stays/goes.
> > >
> > > Thinking out loud a bit
> > >
> > >- Dropping Storm and moving to Spark drops the very hard to
> > >tune/manage/troubleshoot Storm.
> > >- Dropping the UIs (and making SQL the external interface)
> 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Yerex, Tom
Hi Nick,

I see there is a lot of work done using Ansible in the repository. Do you or 
anyone have enough experience to judge if it is possible to leverage Ansible as 
a replacement to deploy a working cluster?

Now that I am typing this out, I wonder if docker might be a solution that 
would work? I don't have much experience with docker, perhaps a dedicated 
docker image that is designed to connect with other dockerized applications 
such as Storm, Kafka, etc..?

--Tom.

On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:

This is a good discussion and one that I haven't fully grappled with in my
own mind yet. I'll have more to add, but I just want to chime in on the
topic of Ambari at this point.

### Ambari and the Paywall

The problem with Ambari is that its installation mechanism requires a
repository of compiled packages (RPMs, DEBs, etc.) To install the
underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
relied on binary packages that were made freely available by
Cloudera/Hortonworks. As of this past January, those packages are now
behind a paywall.

Due to the paywall, installing your own HDP cluster with Ambari is now
effectively dead.  I am not sure if legacy versions of Kafka, HBase, Storm,
etc will continue to be freely available, but even if so, we cannot
continue to rely on this mechanism if new versions and security updates
will not be made available.

The Apache Metron project does not publish compiled binaries or packages
either.  We do make the code freely available to allow users to build and
publish their own Metron packages.   But even with this capability, unless
you have a means to install the underlying platform dependencies via
Ambari, installing Metron with Ambari has little value.

Unfortunately, I don't see a feasible path forward for Metron's Ambari
MPack.

### Dev Environment

This not only impacts the users of Apache Metron, this impacts contributors
also. Our primary development environment relies on that Ambari MPack.  To
continue development on any of the components of Apache Metron, we would
need to build an alternative development environment that can function
despite the paywall.  That could take many shapes, but in my opinion it
would be a blocker for continuing any development on Apache Metron,
unfortunately.

Please do let me know if anyone disagrees or can think of an alternative
approach that would allow the current Ambari MPack to remain viable.
















On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov  wrote:

>   - Dropping Ambari.
>
> I like the progress that Apache did with Ambari in 2.7. And I don't know a
> better installer/manager for all the services (we use other Hadoop eco
> services besides Metron).
>
> Sometimes its buggy, agents get stuck or server needs reboot from time to
> time, mpacks brake some functionality. But overall I feel this is the
> direction for central management and orchestration.
>
> - Dima
>
> On Wed, Apr 15, 2020, 12:45 Justin Leet  wrote:
>
> > This is a bit off the top of my head, but I'd I agree with pretty much
> all
> > of points on what's bringing a lot of overhead.  There's probably also a
> > worthwhile discussion about what value we're shooting for the project to
> > provide to people that influences what stays/goes.
> >
> > Thinking out loud a bit
> >
> >- Dropping Storm and moving to Spark drops the very hard to
> >tune/manage/troubleshoot Storm.
> >- Dropping the UIs (and making SQL the external interface) pretty 
much
> >implies dropping the REST APIs and ES/Solr.  ES/Solr have been a 
giant
> >source of dev heartache on the project and they exist primarily for
> the
> >real time use case.  People can build whatever UIs or use existing
> tools
> >against Parquet/Hive/whatever.
> >- Dropping Ambari. It's a complex beast to install because of how 
many
> >components we have. Dropping the above makes our install much easier
> and
> >should alleviate the need for a complex installer.
> >
> > At that point, we're basically left with
> >
> >- Some Spark for parse -> enrich -> output
> >- The profiler
> >- Stellar
> >- Probably some other misc stuff (sensors, bro kafka plugging, etc.)
> >
> > At a glance, that seems almost an order of magnitude smaller than what 
we
> > currently try to handle.
> >
> > I'm not really sure what an appropriate way to handle the profiler is.
> I've
> > barely touched the code for it, so I anything I say is a vague guess.
> >
> > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom  wrote:
> >
> > > To me Metron is big and broad in the scope of technology required to
> get
> > > it