RE: Development Activity has dropped to effectively 0, what should we do?

2020-04-23 Thread Yerex, Tom
(This is only brainstorming.)

I like Metron's documentation. There has been effort and care taken there.

Ambari is nice, but given that mpack is moving behind a paywall then it seems 
that the groups benefiting from the paywall can chip in to build Metron mpacks 
at their leisure.

Elasticsearch is popular so I can see an argument to keep it. On the flip side, 
Elastisearch is not really trivial to run. Replacing that with a simple 
template that writes data to a file as an example of how to write an IndexDAO 
(pardon if my terminology is incorrect), and split Elasticsearch into another 
repo to be maintained by ELK enthusiasts would reduce the core workload further.

Hbase might be another piece that can be put into another project and another 
simpler example written that relies on something like SQLite could replace it. 
SQLite is relatively trivial to set up and run.

Deployment in Ansible and maintaining the development build, with some work in 
documentation on how to add modules like (as an example), Elasticsearch and 
Hbase for "bigger" development work.

--T.


On 2020-04-21 10:12:52-07:00 Otto Fowler wrote:

 I think the difference is the maintenance of the core of metron that *has*
to be, and other things that may still be done, but will be worked on for
their merits or by community need and not be required for everything

On April 21, 2020 at 10:29:24, Justin Leet (justinjl...@gmail.com) wrote:

How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level. This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope. It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen n...@nickallen.org wrote:

gt; Hi Tom -
gt;
gt; gt; Do you or anyone have enough experience to judge if it is 
possible to
gt; leverage Ansible as a replacement to deploy a working cluster?
gt;
gt; Yes, I worked a lot on the Ansible mechanism in the early days of 
Metron.
gt; This was the primary deployment mechanism before we had the Ambari 
MPack.
gt;
gt; We found it very difficult to use Ansible to create a one-size-fits-all
gt; deployment solution. It's possible, but very difficult to get a 
solution
gt; that doesn't take close monitoring and manual work arounds when
attempting
gt; to use it across environments of different sizes and shapes. In terms 
of
gt; usability, the Ambari MPack was a big step-up in my opinion.
gt;
gt;
gt; gt; perhaps a dedicated docker image that is designed to connect 
with other
gt; dockerized applications such as Storm, Kafka, etc..?
gt;
gt; Yes, I think that would be the way to go for a dev environment. We 
would
be
gt; able to use community supported containers for most of our underlying
gt; platform needs. Unfortunately, this alone would not help anyone deploy
gt; Metron on a cluster.
gt;
gt;
gt;
gt;
gt; On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom tom.ye...@ubc.ca 
wrote:
gt;
gt; gt; Hi Nick,
gt; gt;
gt; gt; I see there is a lot of work done using Ansible in the 
repository. Do
you
gt; gt; or anyone have enough experience to judge if it is possible 
to leverage
gt; gt; Ansible as a replacement to deploy a working cluster?
gt; gt;
gt; gt; Now that I am typing this out, I wonder if docker might be a 
solution
gt; that
gt; gt; would work? I don't have much experience with docker, perhaps 
a
dedicated
gt; gt; docker image that is designed to connect with other dockerized
gt; applications
gt; gt; such as Storm, Kafka, etc..?
gt; gt;
gt; gt; --Tom.
gt; gt;
gt; gt; On 2020-04-17, 11:27 AM, "Nick Allen" 
n...@nickallen.org wrote:
gt; gt;
gt; gt; This is a good discussion and one that I haven't fully 
grappled with
gt; gt; in my
gt; gt; own mind yet. I'll have more to add, but I just want to chime 
in on
gt; the
gt; gt; topic of Ambari at this point.
gt; gt;
gt; gt; ### Ambari and the Paywall
gt; gt;
gt; gt; The problem with Ambari is that its installation mechanism 
requires a
gt; gt; repository of compiled packages (RPMs, DEBs, etc.) To install 
the
gt; gt; underlying platform dependencies (like Kafka, HBase, Storm, 
Zk, etc)
gt; we
gt; gt; relied on binary packages that were made freely available by
gt; gt; Cloudera/Hortonworks. As of this past January, those packages 
are now
gt; gt; behind a paywall.
gt; gt;
gt; gt; Due to the paywall, installing your own HDP cluster with 
Ambari is
gt; now
gt; gt; effectively dead. I am not sure if 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Otto Fowler
 I think the difference is the maintenance of the core of metron that *has*
to be, and other things that may still be done, but will be worked on for
their merits or by community need and not be required for everything

On April 21, 2020 at 10:29:24, Justin Leet (justinjl...@gmail.com) wrote:

How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level. This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope. It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen  wrote:

> Hi Tom -
>
> > Do you or anyone have enough experience to judge if it is possible to
> leverage Ansible as a replacement to deploy a working cluster?
>
> Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
> This was the primary deployment mechanism before we had the Ambari MPack.
>
> We found it very difficult to use Ansible to create a one-size-fits-all
> deployment solution. It's possible, but very difficult to get a solution
> that doesn't take close monitoring and manual work arounds when
attempting
> to use it across environments of different sizes and shapes. In terms of
> usability, the Ambari MPack was a big step-up in my opinion.
>
>
> > perhaps a dedicated docker image that is designed to connect with other
> dockerized applications such as Storm, Kafka, etc..?
>
> Yes, I think that would be the way to go for a dev environment. We would
be
> able to use community supported containers for most of our underlying
> platform needs. Unfortunately, this alone would not help anyone deploy
> Metron on a cluster.
>
>
>
>
> On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:
>
> > Hi Nick,
> >
> > I see there is a lot of work done using Ansible in the repository. Do
you
> > or anyone have enough experience to judge if it is possible to leverage
> > Ansible as a replacement to deploy a working cluster?
> >
> > Now that I am typing this out, I wonder if docker might be a solution
> that
> > would work? I don't have much experience with docker, perhaps a
dedicated
> > docker image that is designed to connect with other dockerized
> applications
> > such as Storm, Kafka, etc..?
> >
> > --Tom.
> >
> > On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
> >
> > This is a good discussion and one that I haven't fully grappled with
> > in my
> > own mind yet. I'll have more to add, but I just want to chime in on
> the
> > topic of Ambari at this point.
> >
> > ### Ambari and the Paywall
> >
> > The problem with Ambari is that its installation mechanism requires a
> > repository of compiled packages (RPMs, DEBs, etc.) To install the
> > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc)
> we
> > relied on binary packages that were made freely available by
> > Cloudera/Hortonworks. As of this past January, those packages are now
> > behind a paywall.
> >
> > Due to the paywall, installing your own HDP cluster with Ambari is
> now
> > effectively dead. I am not sure if legacy versions of Kafka, HBase,
> > Storm,
> > etc will continue to be freely available, but even if so, we cannot
> > continue to rely on this mechanism if new versions and security
> updates
> > will not be made available.
> >
> > The Apache Metron project does not publish compiled binaries or
> > packages
> > either. We do make the code freely available to allow users to build
> > and
> > publish their own Metron packages. But even with this capability,
> > unless
> > you have a means to install the underlying platform dependencies via
> > Ambari, installing Metron with Ambari has little value.
> >
> > Unfortunately, I don't see a feasible path forward for Metron's
> Ambari
> > MPack.
> >
> > ### Dev Environment
> >
> > This not only impacts the users of Apache Metron, this impacts
> > contributors
> > also. Our primary development environment relies on that Ambari
> > MPack. To
> > continue development on any of the components of Apache Metron, we
> > would
> > need to build an alternative development environment that can
> function
> > despite the paywall. That could take many shapes, but in my opinion
> it
> > would be a blocker for continuing any development on Apache Metron,
> > unfortunately.
> >
> > Please do let me know if anyone disagrees or can think of an
> > alternative
> > approach that would allow the current Ambari MPack to remain viable.
> >
> >
> >
> >
> >
> >
> >

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Justin Leet
How we install depends on what we're choosing to keep around. My concern is
getting core Metron's scope down to a supportable level.  This entire
conversation is probably just a thought experiment until we properly limit
the rest of our scope.  It's putting the cart before the horse. I want to
emphasize this, because we're having a discussion about how to install
something that in many ways doesn't actually exist yet.

A lot of the install complexity comes from managing so many moving parts at
once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure
we need a big installer to manage everything. Plenty of projects trust
people to be able to run convenience scripts and shell commands. Again, I
think this is an academic discussion until we figure out our overall
project direction.

On Tue, Apr 21, 2020 at 10:02 AM Nick Allen  wrote:

> Hi Tom -
>
> >  Do you or anyone have enough experience to judge if it is possible to
> leverage Ansible as a replacement to deploy a working cluster?
>
> Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
> This was the primary deployment mechanism before we had the Ambari MPack.
>
> We found it very difficult to use Ansible to create a one-size-fits-all
> deployment solution. It's possible, but very difficult to get a solution
> that doesn't take close monitoring and manual work arounds when attempting
> to use it across environments of different sizes and shapes. In terms of
> usability, the Ambari MPack was a big step-up in my opinion.
>
>
> >  perhaps a dedicated docker image that is designed to connect with other
> dockerized applications such as Storm, Kafka, etc..?
>
> Yes, I think that would be the way to go for a dev environment. We would be
> able to use community supported containers for most of our underlying
> platform needs. Unfortunately, this alone would not help anyone deploy
> Metron on a cluster.
>
>
>
>
> On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:
>
> > Hi Nick,
> >
> > I see there is a lot of work done using Ansible in the repository. Do you
> > or anyone have enough experience to judge if it is possible to leverage
> > Ansible as a replacement to deploy a working cluster?
> >
> > Now that I am typing this out, I wonder if docker might be a solution
> that
> > would work? I don't have much experience with docker, perhaps a dedicated
> > docker image that is designed to connect with other dockerized
> applications
> > such as Storm, Kafka, etc..?
> >
> > --Tom.
> >
> > On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
> >
> > This is a good discussion and one that I haven't fully grappled with
> > in my
> > own mind yet. I'll have more to add, but I just want to chime in on
> the
> > topic of Ambari at this point.
> >
> > ### Ambari and the Paywall
> >
> > The problem with Ambari is that its installation mechanism requires a
> > repository of compiled packages (RPMs, DEBs, etc.) To install the
> > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc)
> we
> > relied on binary packages that were made freely available by
> > Cloudera/Hortonworks. As of this past January, those packages are now
> > behind a paywall.
> >
> > Due to the paywall, installing your own HDP cluster with Ambari is
> now
> > effectively dead.  I am not sure if legacy versions of Kafka, HBase,
> > Storm,
> > etc will continue to be freely available, but even if so, we cannot
> > continue to rely on this mechanism if new versions and security
> updates
> > will not be made available.
> >
> > The Apache Metron project does not publish compiled binaries or
> > packages
> > either.  We do make the code freely available to allow users to build
> > and
> > publish their own Metron packages.   But even with this capability,
> > unless
> > you have a means to install the underlying platform dependencies via
> > Ambari, installing Metron with Ambari has little value.
> >
> > Unfortunately, I don't see a feasible path forward for Metron's
> Ambari
> > MPack.
> >
> > ### Dev Environment
> >
> > This not only impacts the users of Apache Metron, this impacts
> > contributors
> > also. Our primary development environment relies on that Ambari
> > MPack.  To
> > continue development on any of the components of Apache Metron, we
> > would
> > need to build an alternative development environment that can
> function
> > despite the paywall.  That could take many shapes, but in my opinion
> it
> > would be a blocker for continuing any development on Apache Metron,
> > unfortunately.
> >
> > Please do let me know if anyone disagrees or can think of an
> > alternative
> > approach that would allow the current Ambari MPack to remain viable.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov 
> > wrote:
> >
> > >   - Dropping Ambari.
> > >
> > > 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Nick Allen
Hi Tom -

>  Do you or anyone have enough experience to judge if it is possible to
leverage Ansible as a replacement to deploy a working cluster?

Yes, I worked a lot on the Ansible mechanism in the early days of Metron.
This was the primary deployment mechanism before we had the Ambari MPack.

We found it very difficult to use Ansible to create a one-size-fits-all
deployment solution. It's possible, but very difficult to get a solution
that doesn't take close monitoring and manual work arounds when attempting
to use it across environments of different sizes and shapes. In terms of
usability, the Ambari MPack was a big step-up in my opinion.


>  perhaps a dedicated docker image that is designed to connect with other
dockerized applications such as Storm, Kafka, etc..?

Yes, I think that would be the way to go for a dev environment. We would be
able to use community supported containers for most of our underlying
platform needs. Unfortunately, this alone would not help anyone deploy
Metron on a cluster.




On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom  wrote:

> Hi Nick,
>
> I see there is a lot of work done using Ansible in the repository. Do you
> or anyone have enough experience to judge if it is possible to leverage
> Ansible as a replacement to deploy a working cluster?
>
> Now that I am typing this out, I wonder if docker might be a solution that
> would work? I don't have much experience with docker, perhaps a dedicated
> docker image that is designed to connect with other dockerized applications
> such as Storm, Kafka, etc..?
>
> --Tom.
>
> On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:
>
> This is a good discussion and one that I haven't fully grappled with
> in my
> own mind yet. I'll have more to add, but I just want to chime in on the
> topic of Ambari at this point.
>
> ### Ambari and the Paywall
>
> The problem with Ambari is that its installation mechanism requires a
> repository of compiled packages (RPMs, DEBs, etc.) To install the
> underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
> relied on binary packages that were made freely available by
> Cloudera/Hortonworks. As of this past January, those packages are now
> behind a paywall.
>
> Due to the paywall, installing your own HDP cluster with Ambari is now
> effectively dead.  I am not sure if legacy versions of Kafka, HBase,
> Storm,
> etc will continue to be freely available, but even if so, we cannot
> continue to rely on this mechanism if new versions and security updates
> will not be made available.
>
> The Apache Metron project does not publish compiled binaries or
> packages
> either.  We do make the code freely available to allow users to build
> and
> publish their own Metron packages.   But even with this capability,
> unless
> you have a means to install the underlying platform dependencies via
> Ambari, installing Metron with Ambari has little value.
>
> Unfortunately, I don't see a feasible path forward for Metron's Ambari
> MPack.
>
> ### Dev Environment
>
> This not only impacts the users of Apache Metron, this impacts
> contributors
> also. Our primary development environment relies on that Ambari
> MPack.  To
> continue development on any of the components of Apache Metron, we
> would
> need to build an alternative development environment that can function
> despite the paywall.  That could take many shapes, but in my opinion it
> would be a blocker for continuing any development on Apache Metron,
> unfortunately.
>
> Please do let me know if anyone disagrees or can think of an
> alternative
> approach that would allow the current Ambari MPack to remain viable.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov 
> wrote:
>
> >   - Dropping Ambari.
> >
> > I like the progress that Apache did with Ambari in 2.7. And I don't
> know a
> > better installer/manager for all the services (we use other Hadoop
> eco
> > services besides Metron).
> >
> > Sometimes its buggy, agents get stuck or server needs reboot from
> time to
> > time, mpacks brake some functionality. But overall I feel this is the
> > direction for central management and orchestration.
> >
> > - Dima
> >
> > On Wed, Apr 15, 2020, 12:45 Justin Leet 
> wrote:
> >
> > > This is a bit off the top of my head, but I'd I agree with pretty
> much
> > all
> > > of points on what's bringing a lot of overhead.  There's probably
> also a
> > > worthwhile discussion about what value we're shooting for the
> project to
> > > provide to people that influences what stays/goes.
> > >
> > > Thinking out loud a bit
> > >
> > >- Dropping Storm and moving to Spark drops the very hard to
> > >tune/manage/troubleshoot Storm.
> > >- Dropping the UIs (and making SQL the external interface)
> 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-21 Thread Yerex, Tom
Hi Nick,

I see there is a lot of work done using Ansible in the repository. Do you or 
anyone have enough experience to judge if it is possible to leverage Ansible as 
a replacement to deploy a working cluster?

Now that I am typing this out, I wonder if docker might be a solution that 
would work? I don't have much experience with docker, perhaps a dedicated 
docker image that is designed to connect with other dockerized applications 
such as Storm, Kafka, etc..?

--Tom.

On 2020-04-17, 11:27 AM, "Nick Allen"  wrote:

This is a good discussion and one that I haven't fully grappled with in my
own mind yet. I'll have more to add, but I just want to chime in on the
topic of Ambari at this point.

### Ambari and the Paywall

The problem with Ambari is that its installation mechanism requires a
repository of compiled packages (RPMs, DEBs, etc.) To install the
underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
relied on binary packages that were made freely available by
Cloudera/Hortonworks. As of this past January, those packages are now
behind a paywall.

Due to the paywall, installing your own HDP cluster with Ambari is now
effectively dead.  I am not sure if legacy versions of Kafka, HBase, Storm,
etc will continue to be freely available, but even if so, we cannot
continue to rely on this mechanism if new versions and security updates
will not be made available.

The Apache Metron project does not publish compiled binaries or packages
either.  We do make the code freely available to allow users to build and
publish their own Metron packages.   But even with this capability, unless
you have a means to install the underlying platform dependencies via
Ambari, installing Metron with Ambari has little value.

Unfortunately, I don't see a feasible path forward for Metron's Ambari
MPack.

### Dev Environment

This not only impacts the users of Apache Metron, this impacts contributors
also. Our primary development environment relies on that Ambari MPack.  To
continue development on any of the components of Apache Metron, we would
need to build an alternative development environment that can function
despite the paywall.  That could take many shapes, but in my opinion it
would be a blocker for continuing any development on Apache Metron,
unfortunately.

Please do let me know if anyone disagrees or can think of an alternative
approach that would allow the current Ambari MPack to remain viable.
















On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov  wrote:

>   - Dropping Ambari.
>
> I like the progress that Apache did with Ambari in 2.7. And I don't know a
> better installer/manager for all the services (we use other Hadoop eco
> services besides Metron).
>
> Sometimes its buggy, agents get stuck or server needs reboot from time to
> time, mpacks brake some functionality. But overall I feel this is the
> direction for central management and orchestration.
>
> - Dima
>
> On Wed, Apr 15, 2020, 12:45 Justin Leet  wrote:
>
> > This is a bit off the top of my head, but I'd I agree with pretty much
> all
> > of points on what's bringing a lot of overhead.  There's probably also a
> > worthwhile discussion about what value we're shooting for the project to
> > provide to people that influences what stays/goes.
> >
> > Thinking out loud a bit
> >
> >- Dropping Storm and moving to Spark drops the very hard to
> >tune/manage/troubleshoot Storm.
> >- Dropping the UIs (and making SQL the external interface) pretty 
much
> >implies dropping the REST APIs and ES/Solr.  ES/Solr have been a 
giant
> >source of dev heartache on the project and they exist primarily for
> the
> >real time use case.  People can build whatever UIs or use existing
> tools
> >against Parquet/Hive/whatever.
> >- Dropping Ambari. It's a complex beast to install because of how 
many
> >components we have. Dropping the above makes our install much easier
> and
> >should alleviate the need for a complex installer.
> >
> > At that point, we're basically left with
> >
> >- Some Spark for parse -> enrich -> output
> >- The profiler
> >- Stellar
> >- Probably some other misc stuff (sensors, bro kafka plugging, etc.)
> >
> > At a glance, that seems almost an order of magnitude smaller than what 
we
> > currently try to handle.
> >
> > I'm not really sure what an appropriate way to handle the profiler is.
> I've
> > barely touched the code for it, so I anything I say is a vague guess.
> >
> > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom  wrote:
> >
> > > To me Metron is big and broad in the scope of technology required to
> get
> > > it 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-17 Thread Nick Allen
This is a good discussion and one that I haven't fully grappled with in my
own mind yet. I'll have more to add, but I just want to chime in on the
topic of Ambari at this point.

### Ambari and the Paywall

The problem with Ambari is that its installation mechanism requires a
repository of compiled packages (RPMs, DEBs, etc.) To install the
underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
relied on binary packages that were made freely available by
Cloudera/Hortonworks. As of this past January, those packages are now
behind a paywall.

Due to the paywall, installing your own HDP cluster with Ambari is now
effectively dead.  I am not sure if legacy versions of Kafka, HBase, Storm,
etc will continue to be freely available, but even if so, we cannot
continue to rely on this mechanism if new versions and security updates
will not be made available.

The Apache Metron project does not publish compiled binaries or packages
either.  We do make the code freely available to allow users to build and
publish their own Metron packages.   But even with this capability, unless
you have a means to install the underlying platform dependencies via
Ambari, installing Metron with Ambari has little value.

Unfortunately, I don't see a feasible path forward for Metron's Ambari
MPack.

### Dev Environment

This not only impacts the users of Apache Metron, this impacts contributors
also. Our primary development environment relies on that Ambari MPack.  To
continue development on any of the components of Apache Metron, we would
need to build an alternative development environment that can function
despite the paywall.  That could take many shapes, but in my opinion it
would be a blocker for continuing any development on Apache Metron,
unfortunately.

Please do let me know if anyone disagrees or can think of an alternative
approach that would allow the current Ambari MPack to remain viable.
















On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov  wrote:

>   - Dropping Ambari.
>
> I like the progress that Apache did with Ambari in 2.7. And I don't know a
> better installer/manager for all the services (we use other Hadoop eco
> services besides Metron).
>
> Sometimes its buggy, agents get stuck or server needs reboot from time to
> time, mpacks brake some functionality. But overall I feel this is the
> direction for central management and orchestration.
>
> - Dima
>
> On Wed, Apr 15, 2020, 12:45 Justin Leet  wrote:
>
> > This is a bit off the top of my head, but I'd I agree with pretty much
> all
> > of points on what's bringing a lot of overhead.  There's probably also a
> > worthwhile discussion about what value we're shooting for the project to
> > provide to people that influences what stays/goes.
> >
> > Thinking out loud a bit
> >
> >- Dropping Storm and moving to Spark drops the very hard to
> >tune/manage/troubleshoot Storm.
> >- Dropping the UIs (and making SQL the external interface) pretty much
> >implies dropping the REST APIs and ES/Solr.  ES/Solr have been a giant
> >source of dev heartache on the project and they exist primarily for
> the
> >real time use case.  People can build whatever UIs or use existing
> tools
> >against Parquet/Hive/whatever.
> >- Dropping Ambari. It's a complex beast to install because of how many
> >components we have. Dropping the above makes our install much easier
> and
> >should alleviate the need for a complex installer.
> >
> > At that point, we're basically left with
> >
> >- Some Spark for parse -> enrich -> output
> >- The profiler
> >- Stellar
> >- Probably some other misc stuff (sensors, bro kafka plugging, etc.)
> >
> > At a glance, that seems almost an order of magnitude smaller than what we
> > currently try to handle.
> >
> > I'm not really sure what an appropriate way to handle the profiler is.
> I've
> > barely touched the code for it, so I anything I say is a vague guess.
> >
> > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom  wrote:
> >
> > > To me Metron is big and broad in the scope of technology required to
> get
> > > it running. If things were more modular that would go a long way to
> > > reducing the learning curve or at least putting it into smaller bites
> > (and
> > > it might encourage more people to get involved).
> > >
> > > If the UI were an add-on module in another project, it would have made
> it
> > > easier for me and it could also encourage my hypothetical buddy who is
> a
> > > web developer expert to get involved since he could focus on the web-ui
> > > module instead of trying to tackle all the other pieces that are
> probably
> > > not part of his bailiwick.
> > >
> > > Stellar is very intriguing, maybe that is not unique to Metron? The
> > > architecture of Metron with respect to parsing, enriching, etc., makes
> a
> > > lot of sense to anyone I talk with. These two aspects of Metron seem
> like
> > > standout examples that make for a powerful platform to develop on.
> 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-16 Thread Dima Kovalyov
  - Dropping Ambari.

I like the progress that Apache did with Ambari in 2.7. And I don't know a
better installer/manager for all the services (we use other Hadoop eco
services besides Metron).

Sometimes its buggy, agents get stuck or server needs reboot from time to
time, mpacks brake some functionality. But overall I feel this is the
direction for central management and orchestration.

- Dima

On Wed, Apr 15, 2020, 12:45 Justin Leet  wrote:

> This is a bit off the top of my head, but I'd I agree with pretty much all
> of points on what's bringing a lot of overhead.  There's probably also a
> worthwhile discussion about what value we're shooting for the project to
> provide to people that influences what stays/goes.
>
> Thinking out loud a bit
>
>- Dropping Storm and moving to Spark drops the very hard to
>tune/manage/troubleshoot Storm.
>- Dropping the UIs (and making SQL the external interface) pretty much
>implies dropping the REST APIs and ES/Solr.  ES/Solr have been a giant
>source of dev heartache on the project and they exist primarily for the
>real time use case.  People can build whatever UIs or use existing tools
>against Parquet/Hive/whatever.
>- Dropping Ambari. It's a complex beast to install because of how many
>components we have. Dropping the above makes our install much easier and
>should alleviate the need for a complex installer.
>
> At that point, we're basically left with
>
>- Some Spark for parse -> enrich -> output
>- The profiler
>- Stellar
>- Probably some other misc stuff (sensors, bro kafka plugging, etc.)
>
> At a glance, that seems almost an order of magnitude smaller than what we
> currently try to handle.
>
> I'm not really sure what an appropriate way to handle the profiler is. I've
> barely touched the code for it, so I anything I say is a vague guess.
>
> On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom  wrote:
>
> > To me Metron is big and broad in the scope of technology required to get
> > it running. If things were more modular that would go a long way to
> > reducing the learning curve or at least putting it into smaller bites
> (and
> > it might encourage more people to get involved).
> >
> > If the UI were an add-on module in another project, it would have made it
> > easier for me and it could also encourage my hypothetical buddy who is a
> > web developer expert to get involved since he could focus on the web-ui
> > module instead of trying to tackle all the other pieces that are probably
> > not part of his bailiwick.
> >
> > Stellar is very intriguing, maybe that is not unique to Metron? The
> > architecture of Metron with respect to parsing, enriching, etc., makes a
> > lot of sense to anyone I talk with. These two aspects of Metron seem like
> > standout examples that make for a powerful platform to develop on.
> >
> > Thanks for continuing this discussion,
> >
> > Tom.
> >
> >
> > On 2020-04-08 15:32:46-07:00 Casey Stella wrote:
> >
> > As far as I know there is no minimum bar of development activity to keep
> a
> > project open.  I think we would all be grateful for any investment that
> you
> > or your organization would want to make.
> > It also occurs to me that your observation is absolutely spot on: we have
> > a LOT of moving parts.
> > I see some deficiencies here:
> >
> >   *   We depend on a lot of the various hadoop ecosystem projects and
> they
> > have to work together very precisely:
> >  *   This makes for a system that is hard to install.
> >  *   This also makes for a system which is hard to tune/manage
> >   *   We have a large surface area of coverage
> >  *   We have an installer, backend system and front-end UI, which
> > stretches our developers a bit thin, especially since there isn't even
> > interest in those systems
> >
> > Perhaps a reconsideration of the scope and technologies that we use would
> > be merited?  If we were to decide to, for instance:
> >
> >   *   Consolidate scope: focus on a viable backend/API rather than a UI
> >   *   Consolidate technology: reposition ourselves on top of Spark as a
> > consolidated streaming/batch system
> >   *   Make SQL our external interface: write out to parquet + the Hive
> > metastore and let users pin up presto tables or hive tables as they see
> fit
> >
> > This might reduce some of our surface area and make it more viable to get
> > started?
> > Anyway, just some thoughts.
> > Casey
> >
> > On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom  > tom.ye...@ubc.ca>> wrote:
> > Hi Casey,
> >
> > I'm new here and new to contributing to an open source project. Thus far
> > my contribution has been questions, however the steep learning curve has
> > had me working to understand all the moving parts for the last 18 months
> > and I see that as a big investment by my organization.
> >
> > What is a level that would be viable?
> >
> > If my organization were to contribute I don't know that it would be soon
> > enough or at the volume that 

Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-15 Thread Justin Leet
This is a bit off the top of my head, but I'd I agree with pretty much all
of points on what's bringing a lot of overhead.  There's probably also a
worthwhile discussion about what value we're shooting for the project to
provide to people that influences what stays/goes.

Thinking out loud a bit

   - Dropping Storm and moving to Spark drops the very hard to
   tune/manage/troubleshoot Storm.
   - Dropping the UIs (and making SQL the external interface) pretty much
   implies dropping the REST APIs and ES/Solr.  ES/Solr have been a giant
   source of dev heartache on the project and they exist primarily for the
   real time use case.  People can build whatever UIs or use existing tools
   against Parquet/Hive/whatever.
   - Dropping Ambari. It's a complex beast to install because of how many
   components we have. Dropping the above makes our install much easier and
   should alleviate the need for a complex installer.

At that point, we're basically left with

   - Some Spark for parse -> enrich -> output
   - The profiler
   - Stellar
   - Probably some other misc stuff (sensors, bro kafka plugging, etc.)

At a glance, that seems almost an order of magnitude smaller than what we
currently try to handle.

I'm not really sure what an appropriate way to handle the profiler is. I've
barely touched the code for it, so I anything I say is a vague guess.

On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom  wrote:

> To me Metron is big and broad in the scope of technology required to get
> it running. If things were more modular that would go a long way to
> reducing the learning curve or at least putting it into smaller bites (and
> it might encourage more people to get involved).
>
> If the UI were an add-on module in another project, it would have made it
> easier for me and it could also encourage my hypothetical buddy who is a
> web developer expert to get involved since he could focus on the web-ui
> module instead of trying to tackle all the other pieces that are probably
> not part of his bailiwick.
>
> Stellar is very intriguing, maybe that is not unique to Metron? The
> architecture of Metron with respect to parsing, enriching, etc., makes a
> lot of sense to anyone I talk with. These two aspects of Metron seem like
> standout examples that make for a powerful platform to develop on.
>
> Thanks for continuing this discussion,
>
> Tom.
>
>
> On 2020-04-08 15:32:46-07:00 Casey Stella wrote:
>
> As far as I know there is no minimum bar of development activity to keep a
> project open.  I think we would all be grateful for any investment that you
> or your organization would want to make.
> It also occurs to me that your observation is absolutely spot on: we have
> a LOT of moving parts.
> I see some deficiencies here:
>
>   *   We depend on a lot of the various hadoop ecosystem projects and they
> have to work together very precisely:
>  *   This makes for a system that is hard to install.
>  *   This also makes for a system which is hard to tune/manage
>   *   We have a large surface area of coverage
>  *   We have an installer, backend system and front-end UI, which
> stretches our developers a bit thin, especially since there isn't even
> interest in those systems
>
> Perhaps a reconsideration of the scope and technologies that we use would
> be merited?  If we were to decide to, for instance:
>
>   *   Consolidate scope: focus on a viable backend/API rather than a UI
>   *   Consolidate technology: reposition ourselves on top of Spark as a
> consolidated streaming/batch system
>   *   Make SQL our external interface: write out to parquet + the Hive
> metastore and let users pin up presto tables or hive tables as they see fit
>
> This might reduce some of our surface area and make it more viable to get
> started?
> Anyway, just some thoughts.
> Casey
>
> On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom  tom.ye...@ubc.ca>> wrote:
> Hi Casey,
>
> I'm new here and new to contributing to an open source project. Thus far
> my contribution has been questions, however the steep learning curve has
> had me working to understand all the moving parts for the last 18 months
> and I see that as a big investment by my organization.
>
> What is a level that would be viable?
>
> If my organization were to contribute I don't know that it would be soon
> enough or at the volume that is recognized as viable, which is why I ask
> the question.
>
>
> On 2020-04-08 15:05:51-07:00 Casey Stella wrote:
>
> Hi all,
>
> When composing the board report today, I realized that we have effectively
> had no development in the last quarter on this project.  Please be aware
> that I say this without a shred of blame or judgement (especially so
> considering I have not contributed in a long time).  That being said, I
> would like to pose the question to the community:
>
> Do we feel that this project is viable?  If so, how are we going to spur
> new contributions?  If not, then should we begin the process to fold the
> project?
>
>
> 

RE: Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Yerex, Tom
To me Metron is big and broad in the scope of technology required to get it 
running. If things were more modular that would go a long way to reducing the 
learning curve or at least putting it into smaller bites (and it might 
encourage more people to get involved).

If the UI were an add-on module in another project, it would have made it 
easier for me and it could also encourage my hypothetical buddy who is a web 
developer expert to get involved since he could focus on the web-ui module 
instead of trying to tackle all the other pieces that are probably not part of 
his bailiwick.

Stellar is very intriguing, maybe that is not unique to Metron? The 
architecture of Metron with respect to parsing, enriching, etc., makes a lot of 
sense to anyone I talk with. These two aspects of Metron seem like standout 
examples that make for a powerful platform to develop on.

Thanks for continuing this discussion,

Tom.


On 2020-04-08 15:32:46-07:00 Casey Stella wrote:

As far as I know there is no minimum bar of development activity to keep a 
project open.  I think we would all be grateful for any investment that you or 
your organization would want to make.
It also occurs to me that your observation is absolutely spot on: we have a LOT 
of moving parts.
I see some deficiencies here:

  *   We depend on a lot of the various hadoop ecosystem projects and they have 
to work together very precisely:
 *   This makes for a system that is hard to install.
 *   This also makes for a system which is hard to tune/manage
  *   We have a large surface area of coverage
 *   We have an installer, backend system and front-end UI, which stretches 
our developers a bit thin, especially since there isn't even interest in those 
systems

Perhaps a reconsideration of the scope and technologies that we use would be 
merited?  If we were to decide to, for instance:

  *   Consolidate scope: focus on a viable backend/API rather than a UI
  *   Consolidate technology: reposition ourselves on top of Spark as a 
consolidated streaming/batch system
  *   Make SQL our external interface: write out to parquet + the Hive 
metastore and let users pin up presto tables or hive tables as they see fit

This might reduce some of our surface area and make it more viable to get 
started?
Anyway, just some thoughts.
Casey

On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom 
mailto:tom.ye...@ubc.ca>> wrote:
Hi Casey,

I'm new here and new to contributing to an open source project. Thus far my 
contribution has been questions, however the steep learning curve has had me 
working to understand all the moving parts for the last 18 months and I see 
that as a big investment by my organization.

What is a level that would be viable?

If my organization were to contribute I don't know that it would be soon enough 
or at the volume that is recognized as viable, which is why I ask the question.


On 2020-04-08 15:05:51-07:00 Casey Stella wrote:

Hi all,

When composing the board report today, I realized that we have effectively
had no development in the last quarter on this project.  Please be aware
that I say this without a shred of blame or judgement (especially so
considering I have not contributed in a long time).  That being said, I
would like to pose the question to the community:

Do we feel that this project is viable?  If so, how are we going to spur
new contributions?  If not, then should we begin the process to fold the
project?


Best,

Casey



Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Casey Stella
As far as I know there is no minimum bar of development activity to keep a
project open.  I think we would all be grateful for any investment that you
or your organization would want to make.

It also occurs to me that your observation is absolutely spot on: we have a
LOT of moving parts.
I see some deficiencies here:

   - We depend on a lot of the various hadoop ecosystem projects and they
   have to work together very precisely:
  - This makes for a system that is hard to install.
  - This also makes for a system which is hard to tune/manage
   - We have a large surface area of coverage
   - We have an installer, backend system and front-end UI, which stretches
  our developers a bit thin, especially since there isn't even interest in
  those systems

Perhaps a reconsideration of the scope and technologies that we use would
be merited?  If we were to decide to, for instance:

   - Consolidate scope: focus on a viable backend/API rather than a UI
   - Consolidate technology: reposition ourselves on top of Spark as a
   consolidated streaming/batch system
   - Make SQL our external interface: write out to parquet + the Hive
   metastore and let users pin up presto tables or hive tables as they see fit

This might reduce some of our surface area and make it more viable to get
started?

Anyway, just some thoughts.

Casey

On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom  wrote:

> Hi Casey,
>
> I'm new here and new to contributing to an open source project. Thus far
> my contribution has been questions, however the steep learning curve has
> had me working to understand all the moving parts for the last 18 months
> and I see that as a big investment by my organization.
>
> What is a level that would be viable?
>
> If my organization were to contribute I don't know that it would be soon
> enough or at the volume that is recognized as viable, which is why I ask
> the question.
>
>
> On 2020-04-08 15:05:51-07:00 Casey Stella wrote:
>
> Hi all,
>
> When composing the board report today, I realized that we have effectively
> had no development in the last quarter on this project.  Please be aware
> that I say this without a shred of blame or judgement (especially so
> considering I have not contributed in a long time).  That being said, I
> would like to pose the question to the community:
>
> Do we feel that this project is viable?  If so, how are we going to spur
> new contributions?  If not, then should we begin the process to fold the
> project?
>
>
> Best,
>
> Casey
>
>


Re: Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Prashant Bhalesain
Hi Casey,

It is still possible to make this project alive and take it forward. The
major challenges I am facing at the moment is deploying machine learning
models.
I see the project still has potential to be made better
There is opportunity to change the way we deploy and serve models.
There is ofcource opportunity to make it more cloud native especially
having a S3 plugin for storage.
There is opportunity to make it more scalable by making it docker,
kubernatees friendly.
The user interface could be made more stable and secure by integrating it
with AD.
The overall architecture could be better with lesser service dependencies,
learning system could be separated from detection system etc.
May be can review the overall technologies involved and rearchitect few
things etc.

I see it as chicken and eggs probelm, clients think since there is no
activity we cannot depend on the project, while the project team thinks
since there are no clients we cannot continue on the project.
If a community conference call is organised, we could further discuss and
come up with a better roadmap for the project.
Let me know your thoughts.

Best Regards
Prashant

On Wed, Apr 8, 2020 at 11:05 PM Casey Stella  wrote:

> Hi all,
>
> When composing the board report today, I realized that we have effectively
> had no development in the last quarter on this project.  Please be aware
> that I say this without a shred of blame or judgement (especially so
> considering I have not contributed in a long time).  That being said, I
> would like to pose the question to the community:
>
> Do we feel that this project is viable?  If so, how are we going to spur
> new contributions?  If not, then should we begin the process to fold the
> project?
>
>
> Best,
>
> Casey
>


RE: Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Yerex, Tom
Hi Casey,

I'm new here and new to contributing to an open source project. Thus far my 
contribution has been questions, however the steep learning curve has had me 
working to understand all the moving parts for the last 18 months and I see 
that as a big investment by my organization.

What is a level that would be viable?

If my organization were to contribute I don't know that it would be soon enough 
or at the volume that is recognized as viable, which is why I ask the question.


On 2020-04-08 15:05:51-07:00 Casey Stella wrote:

Hi all,

When composing the board report today, I realized that we have effectively
had no development in the last quarter on this project.  Please be aware
that I say this without a shred of blame or judgement (especially so
considering I have not contributed in a long time).  That being said, I
would like to pose the question to the community:

Do we feel that this project is viable?  If so, how are we going to spur
new contributions?  If not, then should we begin the process to fold the
project?


Best,

Casey



Development Activity has dropped to effectively 0, what should we do?

2020-04-08 Thread Casey Stella
Hi all,

When composing the board report today, I realized that we have effectively
had no development in the last quarter on this project.  Please be aware
that I say this without a shred of blame or judgement (especially so
considering I have not contributed in a long time).  That being said, I
would like to pose the question to the community:

Do we feel that this project is viable?  If so, how are we going to spur
new contributions?  If not, then should we begin the process to fold the
project?


Best,

Casey