Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread Casey Stella
The problem with throwing up a warning is that this is a sensor-specific configuration and the Indexing topology does not know at topology start time all of the sensors. Furthermore, you can start a new sensor in the middle of a running topology. I'd suggest a compromise and have the indexing

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread James Sirota
The explicit on/off seems like a good option to have. This way I don't have to completely remove the config block in order for me to test something. I think if the config for the writer is unspecified we should throw up a warning. 16.01.2017, 08:54, "Nick Allen" : >>  To

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread Casey Stella
hahaha :) On Mon, Jan 16, 2017 at 10:49 AM, Nick Allen wrote: > I don't quite support it for #1 and #2, but you absolutely sold me on #3. > Good sell. +1 > > > On Mon, Jan 16, 2017 at 10:46 AM, Casey Stella wrote: > > > Well, I like it for a couple of

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread Casey Stella
Well, I like it for a couple of reasons: - It's explicit and clear that the writer is on or off - It enables people to keep their writer config in the file without having the writer on (so I don't have to adjust the when clause to "false" - It enables us to not have to execute a

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread Casey Stella
Yeah, as far as I'm concerned, it should, at least for the current state of affairs. On Mon, Jan 16, 2017 at 10:39 AM, Nick Allen wrote: > Just one question around "default on." How would that behave when we add > new indexers? Would those also be default on? > > If I have

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-16 Thread Nick Allen
I'm all for a compromise here. Sounds like we're getting close. Just one thing. Can you layout the reasoning for having 'enabled' and 'when'? I don't follow the reasoning, but maybe I am missing something. On Sat, Jan 14, 2017 at 12:13 PM, Kyle Richardson wrote:

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-14 Thread Kyle Richardson
I'm +1 on the current proposal. I like Nick's syntax and agree with Jon's enabled property. I also like the idea of a path property for HDFS. -Kyle > On Jan 14, 2017, at 10:51 AM, Casey Stella wrote: > > I'm +1 on an explicit enabled property and a filter (or when)

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-14 Thread Casey Stella
I'm +1 on an explicit enabled property and a filter (or when) property. I think we are zeroing in on a decent design, so that is good. To recap, what I am +1 on is Nick's proposed syntax with the following modifications: 1. An explicit enabled field 2. A default on for unspecified to match

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-14 Thread zeo...@gmail.com
This has the additional benefit of doing something like below when you want to temporarily disable the hdfs writer, but don't want to remove the settings. This removes the need to store the path and batchSize (and many additional settings) somewhere else so they can be brought back in when you

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-14 Thread zeo...@gmail.com
I similarly have a concern there because I prefer being as explicit as possible, which makes things easier to pick up for new users. Using my example from earlier this could look like specifying while(false), but an even better and more obvious approach may be to use enabled(false). So the

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
One thing that I thought of that I very strenuous do not like in Nick's proposal is that if a writer config is not specified then it is turned off (I think; if I misunderstood let me know). In the situation where we have a new sensor, right now if there are no index config and no enrichment

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
One thing that I really like about Nick's suggestion is that it allows writer-specific configs in a clear and simple way. It is more complex for the default case (all writers write to indices named the same thing with a fixed batch size), which I do not like, but maybe it's worth the compromise

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread zeo...@gmail.com
I like the suggestions you made, Nick. The only thing I would add is that it's also nice to see an explicit when(false), as people newer to the platform may not know where to expect configs for the different writers. Being able to do it either way, which I think is already assumed in your model,

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Let me noodle on this over the weekend. Your syntax is looking less onerous to me and I like the following statement from Otto: "In the end, each write destination ‘type’ will need it’s own configuration. This is an extension point." I may come around to your way of thinking. On Fri, Jan 13,

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread zeo...@gmail.com
Hmm, I'm not sure I agree that in most cases users would accept the default batch size, especially in sizeable environments. In search tiers like ES it is very important, and should be tuned to the specific data that you're sending because it depends on the number of bytes, not necessarily number

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Otto Fowler
In the end, each write destination ‘type’ will need it’s own configuration. This is an extension point. { HDFS:{ outputAdapters:[ {name: avro, settings:{ avro stuff…. when:{ }, { name: sequence file, ….. or some such. On January 13, 2017 at 11:51:15, Nick Allen (n...@nickallen.org) wrote: I

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread zeo...@gmail.com
I think Simon has a very valid suggestion. Additionally, I have a two questions. For the following config: { "index" : "foo" ,"batchSize" : 100 } Are now all logs going to the same index? I read this as a writer-specific override of the sensor-specific defaults to use an index name of foo*

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
> > Nick's concerns about my suggestion were that it was overly complex and > hard to grok and that we could dispense with backwards compatibility and > make people do a bit more work on the default case for the benefits of a > simpler advanced case. (Nick, make sure I don't misstate your

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
I will add also that instead of global overrides, like index, we should use configuration key names that are more appropriate to the output. For example, does 'index' really make sense for HDFS? Or would 'path' be more appropriate? { 'elasticsearch': { 'index': 'foo',

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread David Lyle
Thanks Casey! I think I had the right of it, but wanted to make sure. I'm +1 on defaults in global with overrides in sensor-specific. At least in the first iteration. I (like Otto) suspect we'll have a few go-arounds on this. -D... On Fri, Jan 13, 2017 at 11:09 AM, Otto Fowler

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Otto Fowler
This is an excellent point On January 13, 2017 at 10:54:07, Simon Elliston Ball (si...@simonellistonball.com) wrote: Some thing else to consider here is the possibility of multiple indices within a given target technology. For example, if I’m indexing data from a given sensor into, say

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Simon, Great thought. I had considered it, but didn't want to bite off all that as part of a PR. I thought baby-steps for the moment woudl be best. Perhaps this deserves its own JIRA and discussion? On Fri, Jan 13, 2017 at 10:53 AM, Simon Elliston Ball < si...@simonellistonball.com> wrote: >

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Dave, For the benefit of posterity and people who might not be as deeply entangled in the system as we have been, I'll recap things and hopefully answer your question in the process. Historically the index configuration is split between the enrichment configs and the global configs. - The

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread David Lyle
Casey, Can you give me a level set of what your thinking is now? I think it's global control of all index types + overrides on a per-type basis. Fwiw, I'm totally for that, but I want to make sure I'm not imposing my pre-concieved notions on your consensus-driven ones. -D On Fri, Jan 13,

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Simon Elliston Ball
Some thing else to consider here is the possibility of multiple indices within a given target technology. For example, if I’m indexing data from a given sensor into, say solr, I may want it filtered differently into two different indices. This would enable me to create different ‘views’ which

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
Yep, that makes sense, Casey. I understand multiline is still just the same when statement. I was more responding to Otto's concern about dealing with 50 whens. In regards to multiline, I don't know if adding that is worth the potential confusion. I prefer very simple configs that are stupid

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
I am suggesting that, yes. The configs are essentially the same as yours, except there is an override specified at the top level. Without that, in order to specify both HDFS and ES have batch sizes of 100, you have to explicitly configure each. It's less that I'm trying to have backwards

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
Are you saying we support all of these variants? I realize you are trying to have some backwards compatibility, but this also makes it harder for a user to grok (for me at least). Personally I like my original example as there are fewer sub-structures, like 'writerConfig', which makes the whole

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Nick, Yep, that's the example I showed. I'm just suggesting that that when use the multiline JSON trick here . A single "when" statement with a couple "or"'s So: "when" : [ "exists(field1) or" , "exists(field2) or" ,

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
I was thinking there would only be one 'when' for each output. So if we have Elasticsearch and HDFS, you would have only 2 'when's. Each when statement could be as simple or complex as you need. On Fri, Jan 13, 2017 at 10:08 AM, Otto Fowler wrote: > How does it look

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Otto Fowler
We also need to account for the complexity of the statements On January 13, 2017 at 10:27:51, Otto Fowler (ottobackwa...@gmail.com) wrote: Like most things, we are best off to try something and iterate. I just think we should be aware from the beginning ( have tests etc ) of how it works when

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Otto Fowler
Like most things, we are best off to try something and iterate. I just think we should be aware from the beginning ( have tests etc ) of how it works when there are many filters. On January 13, 2017 at 10:11:47, Casey Stella (ceste...@gmail.com) wrote: I imagined one stellar statement and if

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
I imagined one stellar statement and if you wanted an "or" in there, you could put it there. I was also planning on doing the JSOn trick of accepting either a string or list of strings to let you do multiline. e.g. "when" : [ "exists(field1) or" , "exists(field2) or" ,

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Ok, so here's what I'm thinking based on the discussion: - Keeping the configs that we have now (batchSize and index) as defaults for the unspecified writer-specific case - Adding the config Nick suggested *Base Case*: { } - all writers write all messages - index named the same

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread zeo...@gmail.com
Darn it Nick, you beat me to the punch. =) YES, please. I think I discussed this a while back in my ES tuning conversations, but that's _super_ important. I have this documented here under Elasticsearch > On

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Carolyn Duby
For larger installations you need to control what is indexed so you don’t end up with a nasty elastic search situation and so you can mine the data later for reports and training ml models. Thanks Carolyn On 1/13/17, 9:40 AM, "Casey Stella" wrote: >OH that's a good

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Otto Fowler
I prefer option1 with stellar, although I’m concerned that in a real world scenario the amount of filters and rules might be large, and some thought about the structure of the rule expressions for maintainability etc will need to be considered. On January 12, 2017 at 15:52:03, Casey Stella

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
OH that's a good idea! On Fri, Jan 13, 2017 at 9:39 AM, Nick Allen wrote: > I like the "Index Filtering" option based on the flexibility that it > provides. Should each output (HDFS, ES, etc) have its own configuration > settings? For example, aren't things like batching

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Nick Allen
I like the "Index Filtering" option based on the flexibility that it provides. Should each output (HDFS, ES, etc) have its own configuration settings? For example, aren't things like batching handled separately for HDFS versus Elasticsearch? Something along the lines of... { "hdfs" : {

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-13 Thread Casey Stella
Yeah, I tend to like the first option too. Any opposition to that from anyone? The points brought up are good ones and I think that it may be worth a broader discussion of the requirements of indexing in a separate dev list thread. Maybe a list of desires with coherent use-cases justifying them

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Kyle Richardson
I'll second my preference for the first option. I think the ability to use Stellar filters to customize indexing would be a big win. I'm glad Matt brought up the point about data lake and CEP. I think this is a really important use case that we need to consider. Take a simple example... If I have

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley
Ah, I see. If overriding the default index name allows using the same name for multiple sensors, then the goal can be achieved. Thanks, --Matt On 1/12/17, 3:30 PM, "Casey Stella" wrote: Oh, you could! Let's say you have a syslog parser with data from sources 1 2

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Casey Stella
Oh, you could! Let's say you have a syslog parser with data from sources 1 2 and 3. You'd end up with one kafka queue with 3 parsers attached to that queue, each picking part the messages from source 1, 2 and 3. They'd go through separate enrichment and into the indexing topology. In the

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley
Syslog is hell on parsers – I know, I worked at LogLogic in a previous life. It makes perfect sense to route different lines from syslog through different appropriate parsers. But a lot of what the parsers do is identify consistent subsets of metadata and annotate it – eg, src_ip_addr, event

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Casey Stella
yeah, I mean, honestly, I think the approach that we've taken for sources which aggregate different types of data is to provide filters at the parser level and have multiple parser topologies (with different, possibly mutually exclusive filters) running. This would be a completely separate

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley
I’m thinking that CEP (Complex Event Processing) is contrary to the idea of silo-ing data per sensor. Now it’s true that some of those sensors are already aggregating data from multiple sources, so maybe I’m wrong here. But it just seems to me that the “data lake” insights come from being able

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Casey Stella
Hey Matt, Thanks for the comment! 1. At the moment, we only have one index name, the default of which is the sensor name but that's entirely up to the user. This is sensor specific, so it'd be a separate config for each sensor. If we want to build multiple indices per sensor, we'd have to think

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Matt Foley
Casey, good to have controls like this. Couple questions: 1. Regarding the “index” : “squid” name/value pair, is the index name expected to always be a sensor name? Or is the given json structure subordinate to a sensor name in zookeeper? Or can we build arbitrary indexes with this new

Re: [DISCUSS] Turning off indexing writers feature discussion

2017-01-12 Thread Michael Miklavcic
I like the flexibility and expressibility of the first option with Stellar filters. M On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella wrote: > As of METRON-652 , we > will have decoupled the indexing configuration from the