Thanks Pierre! On Mon 6 Jan 2020, 17:06 Pierre Villard, <[email protected]> wrote:
> Hi Emanuel, > > The PR is currently under review so that would not be included in NiFi > 1.10.0 (which is already released). We recently discussed about releasing a > new NiFi version (1.10.1 or 1.11.0) and if the PR is merged before such a > release, it would certainly be included in that version. > > Hope it makes sense, > Pierre > > > Le lun. 6 janv. 2020 à 22:08, Oliveira, Emanuel <[email protected]> > a écrit : > >> Thanks Matt and Mark! >> We still on version >> 1.8.0 >> 10/22/2018 23:48:30 EDT >> Tagged nifi-1.8.0-RC3 >> >> Current version is 1.10 >> >> As curiosity, when could we expected this fix to be available ? Would it >> mean we upgrade to 1.10 ? Thanks. >> >> Thanks//Regards, >> Emanuel Oliveira >> >> >> >> -----Original Message----- >> From: Matt Burgess <[email protected]> >> Sent: Friday 20 December 2019 17:52 >> To: [email protected] >> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> >> This email is from an external source - exercise caution regarding links >> and attachments. >> >> >> Mark is spot-on with the diagnosis, a default empty array is being >> created for the missing field even if no default value is specified in the >> schema. All it needs is an extra null check in order to return null as the >> default value, then the record is marked invalid as expected. >> >> I have written up NIFI-6963 [1] to cover this, and issued a PR to fix it >> [2]. Mark, would you kindly do the honors of a review? Please and thanks! >> >> -Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-6963 >> [2] https://github.com/apache/nifi/pull/3948 >> >> On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <[email protected]> wrote: >> > >> > Emanuel, >> > >> > I looked into this a week or so ago, but haven't had a chance to >> resolve the issue yet. It does appear to be a bug. Specifically, I believe >> the bug is here [1]. When we create a RecordSchema from the Avro Schema, >> we set the default value for the array to an empty array, instead of null. >> Because of this, when the JSON is parsed, we end up creating a Record with >> an empty array for the "Record" field instead of a null. As as result, the >> Record is considered valid because it does have an array (it's just empty). >> I think it *should* be a null value instead. >> > >> > It looks like this was introduced in NIFI-4893 [2]. We can easily >> change it to just return a null value for the default, but that does result >> in two of the unit tests added in NIFI-4893 failing. It may be that those >> unit tests need to be fixed, or it may be that such a change does break >> something. I just haven't had a chance yet to dig that far into it. >> > >> > If you're someone who is comfortable digging into the code and making >> the updates, then please do and I'm happy to review a PR as soon as I'm >> able. >> > >> > Thanks >> > -Mark >> > >> > >> > [1] >> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-exten >> > sion-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/ >> > apache/nifi/avro/AvroTypeUtil.java#L629-L631 >> > >> > [2] https://issues.apache.org/jira/browse/NIFI-4893 >> > >> > >> > >> > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel < >> [email protected]> wrote: >> > >> > Anyway knowledgably on avro schemas can please confirm/suggest if this >> inability to invalidate json payload missing array in root when allowing >> extra field-true is normal ? >> > >> > There’s 2 options with: >> > >> > ValidateRecord.Allow Extra Fields=false à need to supply full schema >> > ValidateRecord.Allow Extra Fields=true à this is what I been >> testing/want, a way to supply schema with only mandatory fields. >> > >> > >> > I want 2 mandatory fields, an array with at least 1 element having >> eventVersion, so minimal json should be: >> > { (..) >> > "Records": [{ >> > "eventVersion": "aaa" >> > (..) >> > } >> > ] >> > (..) >> > } >> > >> > Problem is ValidateRecord considers FF valid if missing “Records” array >> in the root!!!! >> > { >> > "Service": "sssssss", >> > "Event": "eeeee", >> > "Time": "2019-11-25T16:21:53.280Z", >> > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> > "RequestId": "RRRRRRRRRRRRRRRRRR", >> > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> > } >> > >> > IF I supply the array “Records” then the schema correctly validates I >> need at least eventVersion on the array element record. >> > >> > >> > So… maybe my question can be tuned to “is it possible on avro schema >> syntax to specify cardinalities like in a db e/r diagram where a relation >> can be one of the following: >> > 0..n >> > 1..0 >> > 1 and only 1 ? >> > >> > >> > Thanks//Regards, >> > Emanuel Oliveira >> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 >> > 4971 | int: 8-737 4971 | who's who >> > >> > From: Oliveira, Emanuel <[email protected]> >> > Sent: Friday 6 December 2019 10:15 >> > To: [email protected] >> > Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> > >> > Hi Mark, forgot to share the NiFi version we using: >> > 1.8.0 >> > 10/22/2018 23:48:30 EDT >> > Tagged nifi-1.8.0-RC3 >> > >> > >> > Thanks//Regards, >> > Emanuel Oliveira >> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74 >> > 4971 | int: 8-737 4971 | who's who >> > >> > From: Emanuel Oliveira <[email protected]> >> > Sent: Thursday 5 December 2019 22:42 >> > To: [email protected] >> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory >> ARRAY ? >> > >> > This email is from an external source - exercise caution regarding >> links and attachments. >> > >> > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into >> GenerateFlowfile as this is the problem. >> > >> > Cheers, >> > Emanuel >> > >> > On Thu 5 Dec 2019, 22:03 Mark Payne, <[email protected]> wrote: >> > >> > Emanuel, >> > >> > What version of NiFi are you using? >> > >> > I just tested the attached template against the latest, and the >> FlowFile was routed to 'invalid' with the explanation: >> > >> > Records in this FlowFile were invalid for the following reasons: The >> > following 1 fields were missing: [[0]/Records/eventVersion] >> > >> > >> > >> > >> > Thanks >> > -Mark >> > >> > >> > >> > >> > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <[email protected]> >> wrote: >> > >> > Hi all, >> > >> > I been struggling to find a way for ValidateRecord using Avro Schema to >> force mandatory the presence of an array on json payload, problem is if >> array “records” is missing Validate is considering FF valid ☹. >> > --objective - Mandatory to have "Records array" with at least >> "eventVersion" >> > - using ValidateRecord > Allow Extra Fields >> > - problem im facing is nifi dont trigger payload BAD 1 as invalid!! >> > >> > How can I make mandatory the Records array ? Is it possible ? >> > >> > I know I can eventually use a SplitJson JsonPath Expression=$.Records >> to rid off the ARRAY, and also to fial if array "Records" not present.. But >> I would like to have a clean solution using just avro schema, is this >> possible ? >> > >> > >> > >> > --OK - payload GOOD >> > { >> > "Service": "sssssss", >> > "Event": "eeeee", >> > "Time": "2019-11-25T16:21:53.280Z", >> > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> > "RequestId": "RRRRRRRRRRRRRRRRRR", >> > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> > "Records": [{ >> > "eventVersion": "aaa" >> > } >> > ] >> > } >> > >> > --NOK - payload BAD 1 - missing "Records" array à BUT >> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent >> “invalid” since is not compliant to my avro schema which needs array >> “Records” with element “eventVersion” as 2 mandatory things. >> > { >> > "Service": "sssssss", >> > "Event": "eeeee", >> > "Time": "2019-11-25T16:21:53.280Z", >> > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> > "RequestId": "RRRRRRRRRRRRRRRRRR", >> > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> > "RecordsXXX": [{ >> > "eventVersion": "aaa" >> > } >> > ] >> > } >> > >> > --OK - payload BAD 2 - "Records" array present but missing >> "eventVersion" >> > { >> > "Service": "sssssss", >> > "Event": "eeeee", >> > "Time": "2019-11-25T16:21:53.280Z", >> > "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb", >> > "RequestId": "RRRRRRRRRRRRRRRRRR", >> > "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", >> > "Records": [{ >> > "eventVersionXX": "aaa" >> > } >> > ] >> > } >> > >> > Its very simple test flow (attachmed the xml template >> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using >> ValidateRecord with JsonReader/Json Writer: >> > <image001.png> >> > >> > >> > Heres ValidateRecord processor + reader/writer controllers: >> > >> > Avro schema with just array “Records” and “eventVersion” as min tag on >> array element. >> > Using Allow Extra Fields true: >> > >> > So im ok having other fields on the root side by side with the array >> “Records”, and also ok to have extra elements inside each array. >> > FYI: the real use case im trying to validate AWS SQS message (s3 >> trigger) where I will be interested on several fields, but crafted this >> simpler example just to ask if its possible to force array to be mandatory >> and with at least 1 element ? >> > >> > ========================================================== >> > >> > --ValidateRecord 1.8.0 >> > Record Reader JsonTreeReader >> > Record Writer JsonRecordSetWriter >> > Record Writer for Invalid Records >> > Schema Access Strategy Use Reader's Schema >> > Schema Registry No value set >> > Schema Name ${schema.name} >> > Schema Text ${avro.schema} >> > Allow Extra Fields true >> > Strict Type Checking true >> > >> > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + >> "eventVersion" on each ARRAY element >> > Schema Access Strategy Use 'Schema Text' Property >> > Schema Registry >> > Schema Name ${schema.name} >> > Schema Version >> > Schema Branch >> > Schema Text >> > { >> > "name": "MyName", >> > "type": "record", >> > "namespace": "aa.bb.cc", >> > "fields": [{ >> > "name": "Records", >> > "type": { >> > "type": "array", >> > "items": { >> > "name": >> "Records_record", >> > "type": "record", >> > "fields": [{ >> > "name": >> "eventVersion", >> > "type": >> "string" >> > } >> > ] >> > } >> > } >> > } >> > ] >> > } Date Format Time Format >> > Timestamp Format >> > >> > --JsonRecordSetWriter 1.8.0 >> > Schema Write Strategy Do Not Write Schema >> > Schema Access Strategy Inherit Record Schema >> > Schema Registry >> > Schema Name ${schema.name} >> > Schema Version >> > Schema Branch >> > Schema Text { "name": "eventVersion", >> "type": "string" } >> > Date Format >> > Time Format >> > Timestamp Format >> > Pretty Print JSON true >> > Suppress Null Values Never Suppress >> > Output Grouping Array >> > >> > Thanks in advance, >> > Emanuel Oliveira >> > >> > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml> >> > >> > >> >
