Hi Emanuel, The PR is currently under review so that would not be included in NiFi 1.10.0 (which is already released). We recently discussed about releasing a new NiFi version (1.10.1 or 1.11.0) and if the PR is merged before such a release, it would certainly be included in that version.
Hope it makes sense, Pierre Le lun. 6 janv. 2020 à 22:08, Oliveira, Emanuel <[email protected]> a écrit : > Thanks Matt and Mark! > We still on version > 1.8.0 > 10/22/2018 23:48:30 EDT > Tagged nifi-1.8.0-RC3 > > Current version is 1.10 > > As curiosity, when could we expected this fix to be available ? Would it > mean we upgrade to 1.10 ? Thanks. > > Thanks//Regards, > Emanuel Oliveira > > > > -----Original Message----- > From: Matt Burgess <[email protected]> > Sent: Friday 20 December 2019 17:52 > To: [email protected] > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > This email is from an external source - exercise caution regarding links > and attachments. > > > Mark is spot-on with the diagnosis, a default empty array is being created > for the missing field even if no default value is specified in the schema. > All it needs is an extra null check in order to return null as the default > value, then the record is marked invalid as expected. > > I have written up NIFI-6963 [1] to cover this, and issued a PR to fix it > [2]. Mark, would you kindly do the honors of a review? Please and thanks! > > -Matt > > [1] https://issues.apache.org/jira/browse/NIFI-6963 > [2] https://github.com/apache/nifi/pull/3948 > > On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <[email protected]> wrote: > > > > Emanuel, > > > > I looked into this a week or so ago, but haven't had a chance to resolve > the issue yet. It does appear to be a bug. Specifically, I believe the bug > is here [1]. When we create a RecordSchema from the Avro Schema, we set > the default value for the array to an empty array, instead of null. Because > of this, when the JSON is parsed, we end up creating a Record with an empty > array for the "Record" field instead of a null. As as result, the Record is > considered valid because it does have an array (it's just empty). I think > it *should* be a null value instead. > > > > It looks like this was introduced in NIFI-4893 [2]. We can easily change > it to just return a null value for the default, but that does result in two > of the unit tests added in NIFI-4893 failing. It may be that those unit > tests need to be fixed, or it may be that such a change does break > something. I just haven't had a chance yet to dig that far into it. > > > > If you're someone who is comfortable digging into the code and making > the updates, then please do and I'm happy to review a PR as soon as I'm > able. > > > > Thanks > > -Mark > > > > > > [1] > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-exten > > sion-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/ > > apache/nifi/avro/AvroTypeUtil.java#L629-L631 > > > > [2] https://issues.apache.org/jira/browse/NIFI-4893 > > > > > > > > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <[email protected]> > wrote: > > > > Anyway knowledgably on avro schemas can please confirm/suggest if this > inability to invalidate json payload missing array in root when allowing > extra field-true is normal ? > > > > There’s 2 options with: > > > > ValidateRecord.Allow Extra Fields=false à need to supply full schema > > ValidateRecord.Allow Extra Fields=true à this is what I been > testing/want, a way to supply schema with only mandatory fields. > > > > > > I want 2 mandatory fields, an array with at least 1 element having > eventVersion, so minimal json should be: > > { (..) > > "Records": [{ > > "eventVersion": "aaa" > > (..) > > } > > ] > > (..) > > } > > > > Problem is ValidateRecord considers FF valid if missing “Records” array > in the root!!!! > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > } > > > > IF I supply the array “Records” then the schema correctly validates I > need at least eventVersion on the array element record. > > > > > > So… maybe my question can be tuned to “is it possible on avro schema > syntax to specify cardinalities like in a db e/r diagram where a relation > can be one of the following: > > 0..n > > 1..0 > > 1 and only 1 ? > > > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 > > 4971 | int: 8-737 4971 | who's who > > > > From: Oliveira, Emanuel <[email protected]> > > Sent: Friday 6 December 2019 10:15 > > To: [email protected] > > Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > > > Hi Mark, forgot to share the NiFi version we using: > > 1.8.0 > > 10/22/2018 23:48:30 EDT > > Tagged nifi-1.8.0-RC3 > > > > > > Thanks//Regards, > > Emanuel Oliveira > > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 > > 4971 | int: 8-737 4971 | who's who > > > > From: Emanuel Oliveira <[email protected]> > > Sent: Thursday 5 December 2019 22:42 > > To: [email protected] > > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory > ARRAY ? > > > > This email is from an external source - exercise caution regarding links > and attachments. > > > > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into > GenerateFlowfile as this is the problem. > > > > Cheers, > > Emanuel > > > > On Thu 5 Dec 2019, 22:03 Mark Payne, <[email protected]> wrote: > > > > Emanuel, > > > > What version of NiFi are you using? > > > > I just tested the attached template against the latest, and the FlowFile > was routed to 'invalid' with the explanation: > > > > Records in this FlowFile were invalid for the following reasons: The > > following 1 fields were missing: [[0]/Records/eventVersion] > > > > > > > > > > Thanks > > -Mark > > > > > > > > > > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <[email protected]> > wrote: > > > > Hi all, > > > > I been struggling to find a way for ValidateRecord using Avro Schema to > force mandatory the presence of an array on json payload, problem is if > array “records” is missing Validate is considering FF valid ☹. > > --objective - Mandatory to have "Records array" with at least > "eventVersion" > > - using ValidateRecord > Allow Extra Fields > > - problem im facing is nifi dont trigger payload BAD 1 as invalid!! > > > > How can I make mandatory the Records array ? Is it possible ? > > > > I know I can eventually use a SplitJson JsonPath Expression=$.Records to > rid off the ARRAY, and also to fial if array "Records" not present.. But I > would like to have a clean solution using just avro schema, is this > possible ? > > > > > > > > --OK - payload GOOD > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --NOK - payload BAD 1 - missing "Records" array à BUT > VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent > “invalid” since is not compliant to my avro schema which needs array > “Records” with element “eventVersion” as 2 mandatory things. > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "RecordsXXX": [{ > > "eventVersion": "aaa" > > } > > ] > > } > > > > --OK - payload BAD 2 - "Records" array present but missing "eventVersion" > > { > > "Service": "sssssss", > > "Event": "eeeee", > > "Time": "2019-11-25T16:21:53.280Z", > > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", > > "RequestId": "RRRRRRRRRRRRRRRRRR", > > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", > > "Records": [{ > > "eventVersionXX": "aaa" > > } > > ] > > } > > > > Its very simple test flow (attachmed the xml template > ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using > ValidateRecord with JsonReader/Json Writer: > > <image001.png> > > > > > > Heres ValidateRecord processor + reader/writer controllers: > > > > Avro schema with just array “Records” and “eventVersion” as min tag on > array element. > > Using Allow Extra Fields true: > > > > So im ok having other fields on the root side by side with the array > “Records”, and also ok to have extra elements inside each array. > > FYI: the real use case im trying to validate AWS SQS message (s3 > trigger) where I will be interested on several fields, but crafted this > simpler example just to ask if its possible to force array to be mandatory > and with at least 1 element ? > > > > ========================================================== > > > > --ValidateRecord 1.8.0 > > Record Reader JsonTreeReader > > Record Writer JsonRecordSetWriter > > Record Writer for Invalid Records > > Schema Access Strategy Use Reader's Schema > > Schema Registry No value set > > Schema Name ${schema.name} > > Schema Text ${avro.schema} > > Allow Extra Fields true > > Strict Type Checking true > > > > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + > "eventVersion" on each ARRAY element > > Schema Access Strategy Use 'Schema Text' Property > > Schema Registry > > Schema Name ${schema.name} > > Schema Version > > Schema Branch > > Schema Text > > { > > "name": "MyName", > > "type": "record", > > "namespace": "aa.bb.cc", > > "fields": [{ > > "name": "Records", > > "type": { > > "type": "array", > > "items": { > > "name": > "Records_record", > > "type": "record", > > "fields": [{ > > "name": > "eventVersion", > > "type": > "string" > > } > > ] > > } > > } > > } > > ] > > } Date Format Time Format > > Timestamp Format > > > > --JsonRecordSetWriter 1.8.0 > > Schema Write Strategy Do Not Write Schema > > Schema Access Strategy Inherit Record Schema > > Schema Registry > > Schema Name ${schema.name} > > Schema Version > > Schema Branch > > Schema Text { "name": "eventVersion", > "type": "string" } > > Date Format > > Time Format > > Timestamp Format > > Pretty Print JSON true > > Suppress Null Values Never Suppress > > Output Grouping Array > > > > Thanks in advance, > > Emanuel Oliveira > > > > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml> > > > > >
