Otto, Depends on what we choose to let go...
After all, one could be saving to HDFS as Parquet, ORC or anything involving schemas where type mismatch or length (in case of JSON schemas for example) it may be an issue... This is illustrated by the sacrifices required in here: https://github.com/fluenda/SecuritySchemas/blame/master/README.md#L11 Logs... logs... logs... Again, I am not fully opposed to the idea of being more lenient, reason why I allowed ParCEFone has a method allowing it: https://github.com/fluenda/ParCEFone/blob/master/src/main/java/com/fluenda/parcefone/parser/CEFParser.java#L78 But I suspect we may need to polish the failure handling a bit more. Cheers PS-Sight... Sometimes I miss my old security data engineering life! :-) On Mon, Jan 7, 2019 at 8:41 AM Otto Fowler <[email protected]> wrote: > Well, in that case they would use strict mode and have to jump through > hoops with other processors. If they are storing into HDFS, or mongo or ES > they would not have that issue right > > > On January 6, 2019 at 14:47:43, Andre ([email protected]) wrote: > > Otto, > > Yes we can, but what if receiving system relies on char(1023) for one of > its columns? > > In a hindsight, I should probably been even more strict and made the > processor to reject messages that are too large (I believe message limit is > 4k chars) before feeding them to the parsers, avoiding "overflows" and > failing fast. > > Having said that, I am not such a CEF zealot and I will be happy to > consider the implementation. PRs welcome :-) > > Cheers > > > On Sun., 6 Jan. 2019, 14:50 Otto Fowler <[email protected] wrote: > >> Would you consider a message, fully formed and parsed, that only >> invalidates on size constraints to not be allowable? Could there not be >> some form of validation that is a compromise between those concerns and >> some otherwise logically correct message who breaks the size constraint? >> It is imaginable that a source formats a correct CEF message, but never >> checks the size in the field. >> >> >> On January 4, 2019 at 19:57:36, Andre ([email protected]) wrote: >> >> Otto, >> >> I considered that but never implemented it in NiFi as I had concerns >> people would have the incorrect assumption the parser can help on those >> cases. >> >> From the top of my head, I have seen type mismatch, length overflows, >> malformed CEF headers (the pipe delimited section of the CEF message). I >> wouldn't be surprised there are a lot of other poorly formed messages. >> >> Cheers >> >> On Fri., 4 Jan. 2019, 23:40 Otto Fowler <[email protected] wrote: >> >>> Yeah, I went through the code after my mail last night. Would having >>> validation optional not be a valid setting? How often would the output >>> feeding this flow be invalid for some other reason? >>> >>> >>> >>> On January 4, 2019 at 04:33:54, Andre ([email protected]) wrote: >>> >>> Otto, >>> >>> The parser should fail in that case (and many others). >>> >>> >>> https://github.com/fluenda/ParCEFone/blob/master/src/main/java/com/fluenda/parcefone/event/CefRev23.java#L327 >>> >>> Cheers >>> >>> On Fri, Jan 4, 2019 at 10:05 AM Otto Fowler <[email protected]> >>> wrote: >>> >>>> Can I ask how you are sure it is the message size that is causing the >>>> error? The parser returns null for any error parsing, so the processor >>>> doesn’t know what happened. It could be that the message didn’t validate, >>>> or something else. >>>> >>>> If the issues _is_ with the validator, then we could allow a property >>>> to optionally call the parser with the do validate flag to false. >>>> >>>> Maybe you can create a jira with a sanitized example line that causes >>>> the error? >>>> >>>> >>>> >>>> >>>> On January 3, 2019 at 15:13:14, Felix McPherson ([email protected]) >>>> wrote: >>>> >>>> Hi, >>>> I'm using the ParseCEF processor to parse CEF message to Json format. >>>> Unfortunately the ParseCEF processor fails for message/events that holds a >>>> string in the Msg field that has more than 1023 character. According to the >>>> CEF standard the Msg field in an event shall not exceed 1023 character. The >>>> PARSECEF fails with: >>>> >>>> "Error >>>> ParseCEF[id=...] Failed to parse... >>>> ...as a CEF message; it does not conform to the CEF standard; routing >>>> to failure. >>>> >>>> Any ideas on a workaround for this problem? I would prefer not having >>>> to remove character in the Msg field string. >>>> Regards,lj >>>> >>>>
