Re: Metron Release 0.6.1 and/or Plugin release 0.3.0?
I agree, I would suggest moving forward with cutting an apache/metron RC. Jon On Wed, Dec 5, 2018 at 3:45 PM Nick Allen wrote: > I would prefer to just go ahead with the release and not wait on an > intermittent, integration test related JIRAs. Just wanting to see if there > is support for getting a RC out sooner rather than later. > > On Tue, Dec 4, 2018 at 4:06 PM zeo...@gmail.com wrote: > > > I agree that we should move forward with the apache/metron 0.7.0 release. > > If 0.3 gets finalized in time we can include it, but otherwise no big > deal > > not including it since the dev environment points to 0.1 which didn't > have > > the issue. > > > > Jon > > > > On Mon, Dec 3, 2018 at 5:09 PM Michael Miklavcic < > > michael.miklav...@gmail.com> wrote: > > > > > I have one more intermittent failure to add to the list from a timeout > in > > > the profiler integration tests. > > > https://issues.apache.org/jira/browse/METRON-1918 > > > > > > > > > On Mon, Dec 3, 2018 at 2:54 PM Michael Miklavcic < > > > michael.miklav...@gmail.com> wrote: > > > > > > > fwiw, I have not been able to reproduce the integration test failure > > that > > > > I logged here - https://issues.apache.org/jira/browse/METRON-1851. > > > Unless > > > > anyone else has seen this, either locally or in Travis, I recommend > we > > > > close it out as unable to reproduce. If it does ever show up again, > the > > > > closed Jira will be out there as a record of it. > > > > > > > > On Mon, Dec 3, 2018 at 2:29 PM Justin Leet > > > wrote: > > > > > > > >> I'm inclined to do move forward with the core repo release. Having > > said > > > >> that, there's a few test bugs and such I'd like to see addressed, > > either > > > >> "won't fix" or preferably with PRs, before creating an RC (as noted > > > >> earlier > > > >> in the thread). It's probably a good opportunity to ask again if > > > there's > > > >> anything outstanding we'd like to see in the release. Is there > > anything > > > >> we'd like taken care of that's relatively close to going in? > > > >> > > > >> If the plugin gets fixed before we're set to move forward with a > core > > > >> release (or we choose not to fix it, given the current version is > > > >> affected), I'm happy to put out a new RC. > > > >> > > > >> On Mon, Dec 3, 2018 at 4:12 PM Michael Miklavcic < > > > >> michael.miklav...@gmail.com> wrote: > > > >> > > > >> > +1 Nick > > > >> > > > > >> > On Mon, Dec 3, 2018 at 2:04 PM Nick Allen > > wrote: > > > >> > > > > >> > > OK, well either way, I see no need to hold up Metron 0.6.1. > > > >> > > > > > >> > > On Mon, Dec 3, 2018 at 3:51 PM zeo...@gmail.com < > zeo...@gmail.com > > > > > > >> > wrote: > > > >> > > > > > >> > > > I believe that 0.2.0 is impacted by the bug. > > > >> > > > > > > >> > > > Jon > > > >> > > > > > > >> > > > On Mon, Dec 3, 2018 at 3:50 PM Nick Allen > > > > >> wrote: > > > >> > > > > > > >> > > > > In light of this comment [1], I propose that we move forward > > > with > > > >> > > another > > > >> > > > > Metron release and forgo the Metron Bro Plugin 0.3.0 release > > > >> until we > > > >> > > can > > > >> > > > > resolve METRON-1910 [2]. There is no need to rush the fix > as > > > the > > > >> > > current > > > >> > > > > 0.2.0 release of the Bro Plugin is not impacted by the bug. > We > > > do > > > >> > have > > > >> > > a > > > >> > > > > good amount of other Metron functionality to release though. > > I > > > do > > > >> > not > > > >> > > > see > > > >> > > > > a need to hold-up the release. > > > >> > > > > > > > >> > > > > --- > > > >> > > > > > > > >> > > > > [1] > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://github.com/apache/metron-bro-plugin-kafka/pull/20#issuecomment-443481440 > > > >> > > > > [2] > https://github.com/apache/metron-bro-plugin-kafka/pull/20 > > > >> > > > > > > > >> > > > > On Thu, Nov 29, 2018 at 10:29 AM Justin Leet < > > > >> justinjl...@gmail.com> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > There's a few issues I would like to see at least triaged > > and > > > >> > > > preferably > > > >> > > > > > addressed prior to the release of the main repo. In Jira, > we > > > >> have a > > > >> > > > > > "test-failures" label, that has a few things attached to > it. > > > If > > > >> we > > > >> > > know > > > >> > > > > of > > > >> > > > > > any other Jiras that should have this label attached, > please > > > do > > > >> so > > > >> > > and > > > >> > > > > I'd > > > >> > > > > > appreciate it if you replied to the thread. See > > test-failures > > > >> > > > > > < > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://issues.apache.org/jira/browse/METRON-1851?jql=project%20%3D%20METRON%20AND%20labels%20%3D%20test-failure > > > >> > > > > > > > > > >> > > > > > . > > > >> > > > > > > > > >> > > > > > The Jiras are: > > > >> > > > > > METRON-1810 < > > >
Re: Metron Release 0.6.1 and/or Plugin release 0.3.0?
I would prefer to just go ahead with the release and not wait on an intermittent, integration test related JIRAs. Just wanting to see if there is support for getting a RC out sooner rather than later. On Tue, Dec 4, 2018 at 4:06 PM zeo...@gmail.com wrote: > I agree that we should move forward with the apache/metron 0.7.0 release. > If 0.3 gets finalized in time we can include it, but otherwise no big deal > not including it since the dev environment points to 0.1 which didn't have > the issue. > > Jon > > On Mon, Dec 3, 2018 at 5:09 PM Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > I have one more intermittent failure to add to the list from a timeout in > > the profiler integration tests. > > https://issues.apache.org/jira/browse/METRON-1918 > > > > > > On Mon, Dec 3, 2018 at 2:54 PM Michael Miklavcic < > > michael.miklav...@gmail.com> wrote: > > > > > fwiw, I have not been able to reproduce the integration test failure > that > > > I logged here - https://issues.apache.org/jira/browse/METRON-1851. > > Unless > > > anyone else has seen this, either locally or in Travis, I recommend we > > > close it out as unable to reproduce. If it does ever show up again, the > > > closed Jira will be out there as a record of it. > > > > > > On Mon, Dec 3, 2018 at 2:29 PM Justin Leet > > wrote: > > > > > >> I'm inclined to do move forward with the core repo release. Having > said > > >> that, there's a few test bugs and such I'd like to see addressed, > either > > >> "won't fix" or preferably with PRs, before creating an RC (as noted > > >> earlier > > >> in the thread). It's probably a good opportunity to ask again if > > there's > > >> anything outstanding we'd like to see in the release. Is there > anything > > >> we'd like taken care of that's relatively close to going in? > > >> > > >> If the plugin gets fixed before we're set to move forward with a core > > >> release (or we choose not to fix it, given the current version is > > >> affected), I'm happy to put out a new RC. > > >> > > >> On Mon, Dec 3, 2018 at 4:12 PM Michael Miklavcic < > > >> michael.miklav...@gmail.com> wrote: > > >> > > >> > +1 Nick > > >> > > > >> > On Mon, Dec 3, 2018 at 2:04 PM Nick Allen > wrote: > > >> > > > >> > > OK, well either way, I see no need to hold up Metron 0.6.1. > > >> > > > > >> > > On Mon, Dec 3, 2018 at 3:51 PM zeo...@gmail.com > > > >> > wrote: > > >> > > > > >> > > > I believe that 0.2.0 is impacted by the bug. > > >> > > > > > >> > > > Jon > > >> > > > > > >> > > > On Mon, Dec 3, 2018 at 3:50 PM Nick Allen > > >> wrote: > > >> > > > > > >> > > > > In light of this comment [1], I propose that we move forward > > with > > >> > > another > > >> > > > > Metron release and forgo the Metron Bro Plugin 0.3.0 release > > >> until we > > >> > > can > > >> > > > > resolve METRON-1910 [2]. There is no need to rush the fix as > > the > > >> > > current > > >> > > > > 0.2.0 release of the Bro Plugin is not impacted by the bug. We > > do > > >> > have > > >> > > a > > >> > > > > good amount of other Metron functionality to release though. > I > > do > > >> > not > > >> > > > see > > >> > > > > a need to hold-up the release. > > >> > > > > > > >> > > > > --- > > >> > > > > > > >> > > > > [1] > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/metron-bro-plugin-kafka/pull/20#issuecomment-443481440 > > >> > > > > [2] https://github.com/apache/metron-bro-plugin-kafka/pull/20 > > >> > > > > > > >> > > > > On Thu, Nov 29, 2018 at 10:29 AM Justin Leet < > > >> justinjl...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > There's a few issues I would like to see at least triaged > and > > >> > > > preferably > > >> > > > > > addressed prior to the release of the main repo. In Jira, we > > >> have a > > >> > > > > > "test-failures" label, that has a few things attached to it. > > If > > >> we > > >> > > know > > >> > > > > of > > >> > > > > > any other Jiras that should have this label attached, please > > do > > >> so > > >> > > and > > >> > > > > I'd > > >> > > > > > appreciate it if you replied to the thread. See > test-failures > > >> > > > > > < > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/browse/METRON-1851?jql=project%20%3D%20METRON%20AND%20labels%20%3D%20test-failure > > >> > > > > > > > > >> > > > > > . > > >> > > > > > > > >> > > > > > The Jiras are: > > >> > > > > > METRON-1810 < > > https://issues.apache.org/jira/browse/METRON-1810> > > >> > > > > > METRON-1814 < > > https://issues.apache.org/jira/browse/METRON-1814> > > >> > > > > > METRON-1851 < > > https://issues.apache.org/jira/browse/METRON-1851> > > >> > > > > > > > >> > > > > > On Wed, Nov 21, 2018 at 2:20 PM zeo...@gmail.com < > > >> zeo...@gmail.com > > >> > > > > >> > > > > wrote: > > >> > > > > > > > >> > > > > > > A metron-bro-plugin-kafka 0.3 release is good to go from > my > > >> side. > > >> > > > > Thanks > > >>
Re: [DISCUSS] Recurrent Large Indexing Error Messages
Why not have a second indexing topology configured just for errors? We can load the same code with two different configurations in two topologies. On December 5, 2018 at 03:55:59, Ali Nazemian (alinazem...@gmail.com) wrote: I think if you look at the indexing error management, it is pretty much similar to parser and enrichment error use cases. It is even more common to expect something ended up in error topics. I think a wider independent job can be used to take care of error management. It can be decided to add a separate topology later on to manage error logs and create alert/notifications separately. It can be even integrated with log feeder and log search. The scenario of sending solution operational logs to the same solution is a bit weird and not enterprise friendly. Normally platform operation team would be a separate team with different objectives and probably they have got a separate monitoring/notification solution in placed already. I don't think it is the end of the world if this part is left to be managed by users. So I prefer option 2 as a short term. Long term solution can be discussed separately. Cheers, Ali On Sat, 20 Oct. 2018, 05:20 Nick Allen I want to discuss solutions for the problem that I have described in > METRON-1832; Recurrent Large Indexing Error Messages. I feel this is a very > easy trap to fall into when using the default settings that currently come > with Metron. > > > ## Problem > > > https://issues.apache.org/jira/browse/METRON-1832 > > > If any index destination like HDFS, Elasticsearch, or Solr goes down while > the Indexing topology is running, an error message is created and sent back > to the user-defined error topic. By default, this is defined to also be > the 'indexing' topic. > > The Indexing topology then consumes this error message and attempts to > write it again. If the index destination is still down, another error > occurs and another error message is created that encapsulates the original > error message. That message is then sent to the 'indexing' topic, which is > later consumed, yet again, by the Indexing topology. > > These error messages will continue to be recycled and grow larger and > larger as each new error message encapsulates all previous error messages > in the "raw_message" field. > > Once the index destination recovers, one giant error message will finally > be written that contains massively duplicated, useless information which > can further negatively impact performance of the index destination. > > Also, the escape character '\' continually compounds one another leading to > long strings of '\' characters in the error message. > > > ## Background > > There was some discussion on how to handle this on the original PR #453 > https://github.com/apache/metron/pull/453. > > ## Solutions > > (1) The first, easiest option is to just do nothing. There was already a > discussion around this and this is the solution that we landed on in #453. > > Pros: Really easy; do nothing. > > Cons: Intermittent problems with ES/Solr can easily create very large error > messages that can significantly impact both search and ingest performance. > > > (2) Change the default indexing error topic to 'indexing_errors' to avoid > recycling error messages. Nothing will consume from the 'indexing_errors' > topic, thus preventing a cycle. > > Pros: Simple, easy change that prevents the cycle. > > Cons: Recoverable indexing errors are not visible to users as they will > never be indexed in ES/Solr. > > (2) Add logic to limit the number times a message can be 'recycled' through > the Indexing topology. This effectively sets a maximum number of retry > attempts. If a message fails N times, then write the message to a separate > unrecoverable, error topic. > > Pros: Recoverable errors are visible to users in ES/Solr. > > Cons: More complex. Users still need to check the unrecoverable, error > topic for potential problems. > > (4) Do not further encapsulate error messages in the 'raw_message' field. > If an error message fails, don't encapsulate it in another error message. > Just push it to the error topic as-is. Could add a field that indicates > how many times the message has failed. > > Pros: Prevents giant error messages from being created from recoverable > errors. > > Cons: Extended outages would still cause the Indexing topology to > repeatedly recycle these error messages, which would ultimately exhaust > resources in Storm. > > > > What other ways can we solve this? >
Re: [DISCUSS] Recurrent Large Indexing Error Messages
I think if you look at the indexing error management, it is pretty much similar to parser and enrichment error use cases. It is even more common to expect something ended up in error topics. I think a wider independent job can be used to take care of error management. It can be decided to add a separate topology later on to manage error logs and create alert/notifications separately. It can be even integrated with log feeder and log search. The scenario of sending solution operational logs to the same solution is a bit weird and not enterprise friendly. Normally platform operation team would be a separate team with different objectives and probably they have got a separate monitoring/notification solution in placed already. I don't think it is the end of the world if this part is left to be managed by users. So I prefer option 2 as a short term. Long term solution can be discussed separately. Cheers, Ali On Sat, 20 Oct. 2018, 05:20 Nick Allen I want to discuss solutions for the problem that I have described in > METRON-1832; Recurrent Large Indexing Error Messages. I feel this is a very > easy trap to fall into when using the default settings that currently come > with Metron. > > > ## Problem > > > https://issues.apache.org/jira/browse/METRON-1832 > > > If any index destination like HDFS, Elasticsearch, or Solr goes down while > the Indexing topology is running, an error message is created and sent back > to the user-defined error topic. By default, this is defined to also be > the 'indexing' topic. > > The Indexing topology then consumes this error message and attempts to > write it again. If the index destination is still down, another error > occurs and another error message is created that encapsulates the original > error message. That message is then sent to the 'indexing' topic, which is > later consumed, yet again, by the Indexing topology. > > These error messages will continue to be recycled and grow larger and > larger as each new error message encapsulates all previous error messages > in the "raw_message" field. > > Once the index destination recovers, one giant error message will finally > be written that contains massively duplicated, useless information which > can further negatively impact performance of the index destination. > > Also, the escape character '\' continually compounds one another leading to > long strings of '\' characters in the error message. > > > ## Background > > There was some discussion on how to handle this on the original PR #453 > https://github.com/apache/metron/pull/453. > > ## Solutions > > (1) The first, easiest option is to just do nothing. There was already a > discussion around this and this is the solution that we landed on in #453. > > Pros: Really easy; do nothing. > > Cons: Intermittent problems with ES/Solr can easily create very large error > messages that can significantly impact both search and ingest performance. > > > (2) Change the default indexing error topic to 'indexing_errors' to avoid > recycling error messages. Nothing will consume from the 'indexing_errors' > topic, thus preventing a cycle. > > Pros: Simple, easy change that prevents the cycle. > > Cons: Recoverable indexing errors are not visible to users as they will > never be indexed in ES/Solr. > > (2) Add logic to limit the number times a message can be 'recycled' through > the Indexing topology. This effectively sets a maximum number of retry > attempts. If a message fails N times, then write the message to a separate > unrecoverable, error topic. > > Pros: Recoverable errors are visible to users in ES/Solr. > > Cons: More complex. Users still need to check the unrecoverable, error > topic for potential problems. > > (4) Do not further encapsulate error messages in the 'raw_message' field. > If an error message fails, don't encapsulate it in another error message. > Just push it to the error topic as-is. Could add a field that indicates > how many times the message has failed. > > Pros: Prevents giant error messages from being created from recoverable > errors. > > Cons: Extended outages would still cause the Indexing topology to > repeatedly recycle these error messages, which would ultimately exhaust > resources in Storm. > > > > What other ways can we solve this? >