Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread zeo...@gmail.com
The field stub also gives something that can potentially be used in the error dashboard (or similar) to graph, allowing failed enrichments to "shout" louder to the end user. Jon On Tue, May 16, 2017 at 12:34 PM Nick Allen wrote: > > but also adds a field stub to indicate

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
> but also adds a field stub to indicate failed enrichment. This is then an indicator to an operator or investigator as well that something is missing, and could drive things like replay of the message to retrospectively enrich when things have calmed down. Yes, I like the idea of a "field stub".

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
If we are timing out things from the cache, we have that latency already On May 16, 2017 at 12:09:32, Casey Stella (ceste...@gmail.com) wrote: We could definitely parallelize within the bolt, but you're right, it does break the storm model. I also like making things other people's problems

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Nick, I’d tend to agree with you there. How about: If an enrichment fails / effectively times out, the join bolt emits the message before cache eviction (as Nick’s point 2), but also adds a field stub to indicate failed enrichment. This is then an indicator to an operator or investigator as

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
Ah, yes. Makes sense and I can see the value in the parallelism that the split/join provides. Personally, I would like to see the code do the following. (1) Scream and shout when something in the cache expires. We have to make sure that it is blatantly obvious to a user what happened. We also

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
I do want to say here, that I don't mean to sound the alarm and say that everything is broken. I would not characterize the topology as "broken" architecturally, but rather the lack of reporting when things go pear-shaped is a bug in implementation. With logging and documentation about the knobs

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We could definitely parallelize within the bolt, but you're right, it does break the storm model. I also like making things other people's problems (it's called working "smart" not "hard", right? not laziness, surely. ;), but yeah, using windowing for this seems like it might introduce some

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Would you then parallelise within Stellar to handle things like multiple lookups? This feels like it would be breaking the storm model somewhat, and could lead to bad things with threads for example. Or would you think of doing something like the grouping Stellar uses today to parallelise

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
I am not sure that you can say we wouldn’t ‘need’ it. But we would not ‘have’ it rather. On May 16, 2017 at 11:59:42, Nick Allen (n...@nickallen.org) wrote: I would like to see us just migrate wholly to Stellar enrichments and remove the separate HBase and Geo enrichment bolts from the

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We still do use split/join even within stellar enrichments. Take for instance the following enrichment: { "enrichment" : { "fieldMap" : { "stellar" : { "config" : { "parallel-task-1" : { "my_field" : "PROFILE_GET()" },

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
I would like to see us just migrate wholly to Stellar enrichments and remove the separate HBase and Geo enrichment bolts from the Enrichment topology. Stellar provides a user with much greater flexibility than the existing HBase and Geo enrichment bolts. A side effect of this would be to greatly

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
The problem is that an enrichment type won't necessarily have a fixed performance characteristic. Take stellar enrichments, for instance. Doing a HBase call for one sensor vs doing simple string munging will have vastly differing performance. Both of them are functioning within the stellar

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Christian Tramnitz
I’m glad you bring this up. This is a huge architectural difference from the original OpenSOC topology and one that we have been warned to take back then. To be perfectly honest, I don’t see the big perfomance improvement from parallel processing. If a specific enrichment is a little more i/o

[DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
Hi All, Last week, I encountered some weirdness in the Enrichment topology. Doing some somewhat high-latency enrichment work, I noticed that at some point, data stopped flowing through the enrichment topology. I tracked down the problem to the join bolt. For those who aren't aware, we do a