The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by myself. Do you have a reproducible flow to validate it?
On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <[email protected]> wrote: > Oh I see, makes, sense your analysis, but sorry I have done java 20 years > ago, nowadays im mostly data engineer (oracle db, etl tools, custom > migrations, snowflake and lately nifi).. so count on me to detect > opportunities to improve things, but not able to change base code/tests. > > > > Thanks so much for your time and analysis, lets wait for community to step > up to do the fix and update/run the unit tests 😊 > > > > Thanks//Regards, > > *Emanuel Oliveira* > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > *From:* Mark Payne <[email protected]> > *Sent:* Wednesday 11 December 2019 15:25 > *To:* [email protected] > *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > > > *This email is from an external source - **exercise caution regarding > links and attachments. * > > > > Emanuel, > > > > I looked into this a week or so ago, but haven't had a chance to resolve > the issue yet. It does appear to be a bug. Specifically, I believe the bug > is here [1]. When we create a RecordSchema from the Avro Schema, we set > the default value for the array to an empty array, instead of null. Because > of this, when the JSON is parsed, we end up creating a Record with an empty > array for the "Record" field instead of a null. As as result, the Record is > considered valid because it does have an array (it's just empty). I think > it *should* be a null value instead. > > > > It looks like this was introduced in NIFI-4893 [2]. We can easily change > it to just return a null value for the default, but that does result in two > of the unit tests added in NIFI-4893 failing. It may be that those unit > tests need to be fixed, or it may be that such a change does break > something. I just haven't had a chance yet to dig that far into it. > > > > If you're someone who is comfortable digging into the code and making the > updates, then please do and I'm happy to review a PR as soon as I'm able. > > > > Thanks > > -Mark > > > > > > [1] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 > > > > [2] https://issues.apache.org/jira/browse/NIFI-4893 > > > > > > > > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <[email protected]> > wrote: > > > > Anyway knowledgably on avro schemas can please confirm/suggest if this > inability to invalidate json payload missing array in root when allowing > extra field-true is normal ? > > > > There’s 2 options with: > > · ValidateRecord.Allow Extra Fields=false à need to supply full > schema > > · ValidateRecord.Allow Extra Fields=true à this is what I been > testing/want, a way to supply schema with only mandatory fields. > > > > I want 2 mandatory fields, an array with at least 1 element having > eventVersion, so minimal json should be: > > { (..) > > "Records": [{ > > "eventVersion": "aaa" > > (..) > > } > > ] > > (..) > > } > > > > Problem is ValidateRecord considers FF valid if missing “Records” array in > the root!!!! > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > } > > > > IF I supply the array “Records” then the schema correctly validates I need > at least eventVersion on the array element record. > > > > > > So… maybe my question can be tuned to “is it possible on avro schema > syntax to specify cardinalities like in a db e/r diagram where a relation > can be one of the following: > > 0..n > > 1..0 > > 1 and only 1 ? > > > > > > Thanks//Regards, > > *Emanuel Oliveira* > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > *From:* Oliveira, Emanuel <[email protected]> > *Sent:* Friday 6 December 2019 10:15 > *To:* [email protected] > *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > > > Hi Mark, forgot to share the NiFi version we using: > > 1.8.0 > > 10/22/2018 23:48:30 EDT > > Tagged nifi-1.8.0-RC3 > > > > > > Thanks//Regards, > > *Emanuel Oliveira* > > Senior Oracle/Data Engineer | CTG | Galway > TEL ext: 353 – (0)91-74 4971 | int: 8-737 4971 *|* who's who > <http://fidelitycentral.fmr.com/ww/a639704> > > > > *From:* Emanuel Oliveira <[email protected]> > *Sent:* Thursday 5 December 2019 22:42 > *To:* [email protected] > *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > > > *This email is from an external source - **exercise caution regarding > links and attachments.* > > > > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into > GenerateFlowfile as this is the problem. > > > > Cheers, > > Emanuel > > > > On Thu 5 Dec 2019, 22:03 Mark Payne, <[email protected]> wrote: > > Emanuel, > > > > What version of NiFi are you using? > > > > I just tested the attached template against the latest, and the FlowFile > was routed to 'invalid' with the explanation: > > > > Records in this FlowFile were invalid for the following reasons: The > following 1 fields were missing: [[0]/Records/eventVersion] > > > > > > > > > > Thanks > > -Mark > > > > > > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <[email protected]> > wrote: > > > > Hi all, > > > > I been struggling to find a way for ValidateRecord using Avro Schema to > force mandatory the presence of an array on json payload, problem is if > array “records” is missing Validate is considering FF valid ☹. > > --objective - Mandatory to have "Records array" with at least > "eventVersion" > > - using ValidateRecord > Allow Extra Fields > > - problem im facing is nifi dont trigger payload BAD 1 as invalid!! > > > > How can I make mandatory the Records array ? Is it possible ? > > > > I know I can eventually use a SplitJson JsonPath Expression=$.Records to > rid off the ARRAY, and also to fial if array "Records" not present.. But I > would like to have a clean solution using just avro schema, is this > possible ? > > > > > > > > --OK - payload GOOD > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --NOK - payload BAD 1 - missing "Records" array à BUT > VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent > “invalid” since is not compliant to my avro schema which needs array > “Records” with element “eventVersion” as 2 mandatory things. > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "RecordsXXX": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --OK - payload BAD 2 - "Records" array present but missing "eventVersion" > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersionXX": "aaa" > > } > > ] > > } > > > > Its very simple test flow (attachmed the xml template > ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using > ValidateRecord with JsonReader/Json Writer: > > <image001.png> > > > > > > Heres ValidateRecord processor + reader/writer controllers: > > - Avro schema with just array “Records” and “eventVersion” as min tag > on array element. > - Using Allow Extra Fields true: > > > - So im ok having other fields on the root side by side with the array > “Records”, and also ok to have extra elements inside each array. > - FYI: the real use case im trying to validate AWS SQS message (s3 > trigger) where I will be interested on several fields, but crafted this > simpler example just to ask if its possible to force array to be > mandatory > and with at least 1 element ? > > ========================================================== > > > > --ValidateRecord 1.8.0 > > Record Reader JsonTreeReader > > Record Writer JsonRecordSetWriter > > Record Writer for Invalid Records > > Schema Access Strategy Use Reader's Schema > > Schema Registry No value set > > Schema Name ${schema.name} > > Schema Text ${avro.schema} > > Allow Extra Fields true > > Strict Type Checking true > > > > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + > "eventVersion" on each ARRAY element > > Schema Access Strategy Use 'Schema Text' Property > > Schema Registry > > Schema Name ${schema.name} > > Schema Version > > Schema Branch > > Schema Text > > { > > "name": "MyName", > > "type": "record", > > "namespace": "aa.bb.cc", > > "fields": [{ > > "name": "Records", > > "type": { > > "type": "array", > > "items": { > > "name": > "Records_record", > > "type": "record", > > "fields": [{ > > "name": > "eventVersion", > > "type": > "string" > > } > > ] > > } > > } > > } > > ] > > } > > Date Format > > Time Format > > Timestamp Format > > > > --JsonRecordSetWriter 1.8.0 > > Schema Write Strategy Do Not Write Schema > > Schema Access Strategy Inherit Record Schema > > Schema Registry > > Schema Name ${schema.name} > > Schema Version > > Schema Branch > > Schema Text { "name": "eventVersion", "type": > "string" } > > Date Format > > Time Format > > Timestamp Format > > Pretty Print JSON true > > Suppress Null Values Never Suppress > > Output Grouping Array > > > > Thanks in advance, > > Emanuel Oliveira > > > > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml> > > >
