The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by
myself. Do you have a reproducible flow to validate it?

On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <[email protected]>
wrote:

> Oh I see, makes, sense your analysis, but sorry I have done java 20 years
> ago, nowadays im mostly data engineer (oracle db, etl tools, custom
> migrations, snowflake and lately nifi).. so count on me to detect
> opportunities to improve things, but not able to change base code/tests.
>
>
>
> Thanks so much for your time and analysis, lets wait for community to step
> up to do the fix and update/run the unit tests 😊
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Mark Payne <[email protected]>
> *Sent:* Wednesday 11 December 2019 15:25
> *To:* [email protected]
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments. *
>
>
>
> Emanuel,
>
>
>
> I looked into this a week or so ago, but haven't had a chance to resolve
> the issue yet. It does appear to be a bug. Specifically, I believe the bug
> is here [1].  When we create a RecordSchema from the Avro Schema, we set
> the default value for the array to an empty array, instead of null. Because
> of this, when the JSON is parsed, we end up creating a Record with an empty
> array for the "Record" field instead of a null. As as result, the Record is
> considered valid because it does have an array (it's just empty). I think
> it *should* be a null value instead.
>
>
>
> It looks like this was introduced in NIFI-4893 [2]. We can easily change
> it to just return a null value for the default, but that does result in two
> of the unit tests added in NIFI-4893 failing. It may be that those unit
> tests need to be fixed, or it may be that such a change does break
> something. I just haven't had a chance yet to dig that far into it.
>
>
>
> If you're someone who is comfortable digging into the code and making the
> updates, then please do and I'm happy to review a PR as soon as I'm able.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>
>
>
> [2] https://issues.apache.org/jira/browse/NIFI-4893
>
>
>
>
>
>
>
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <[email protected]>
> wrote:
>
>
>
> Anyway knowledgably on avro schemas can please confirm/suggest if this
> inability to invalidate json payload missing array in root when allowing
> extra field-true is normal ?
>
>
>
> There’s 2 options with:
>
> ·         ValidateRecord.Allow Extra Fields=false à need to supply full
> schema
>
> ·         ValidateRecord.Allow Extra Fields=true à this is what I been
> testing/want, a way to supply schema with only mandatory fields.
>
>
>
> I want 2 mandatory fields, an array with at least 1 element having
> eventVersion, so minimal json should be:
>
> { (..)
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>          (..)
>
>       }
>
>    ]
>
>    (..)
>
> }
>
>
>
> Problem is ValidateRecord considers FF valid if missing “Records” array in
> the root!!!!
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
> }
>
>
>
> IF I supply the array “Records” then the schema correctly validates I need
> at least eventVersion on the array element record.
>
>
>
>
>
> So… maybe my question can be tuned to “is it possible on avro schema
> syntax to specify cardinalities like in a db e/r diagram where a relation
> can be one of the following:
>
> 0..n
>
> 1..0
>
> 1 and only 1 ?
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Oliveira, Emanuel <[email protected]>
> *Sent:* Friday 6 December 2019 10:15
> *To:* [email protected]
> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> Hi Mark, forgot to share the NiFi version we using:
>
> 1.8.0
>
> 10/22/2018 23:48:30 EDT
>
> Tagged nifi-1.8.0-RC3
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Emanuel Oliveira <[email protected]>
> *Sent:* Thursday 5 December 2019 22:42
> *To:* [email protected]
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments.*
>
>
>
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
> GenerateFlowfile as this is the problem.
>
>
>
> Cheers,
>
> Emanuel
>
>
>
> On Thu 5 Dec 2019, 22:03 Mark Payne, <[email protected]> wrote:
>
> Emanuel,
>
>
>
> What version of NiFi are you using?
>
>
>
> I just tested the attached template against the latest, and the FlowFile
> was routed to 'invalid' with the explanation:
>
>
>
> Records in this FlowFile were invalid for the following reasons: The
> following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
>
>
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <[email protected]>
> wrote:
>
>
>
> Hi all,
>
>
>
> I been struggling to find a way for ValidateRecord using Avro Schema to
> force mandatory the presence of an array on json payload, problem is if
> array “records” is missing Validate is considering FF valid ☹.
>
> --objective - Mandatory to have "Records array" with at least
> "eventVersion"
>
> - using ValidateRecord > Allow Extra Fields
>
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
>
>
> How can I make mandatory the Records array ? Is it possible ?
>
>
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
> rid off the ARRAY, and also to fial if array "Records" not present.. But I
> would like to have a clean solution using just avro schema, is this
> possible ?
>
>
>
>
>
>
>
> --OK - payload GOOD
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --NOK - payload BAD 1 - missing "Records" array à BUT
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
> “invalid” since is not compliant to my avro schema which needs array
> “Records” with element “eventVersion” as 2 mandatory things.
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "RecordsXXX": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersionXX": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> Its very simple test flow (attachmed the xml template
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
> ValidateRecord with JsonReader/Json Writer:
>
> <image001.png>
>
>
>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
>    - Avro schema with just array “Records” and “eventVersion” as min tag
>    on array element.
>    - Using Allow Extra Fields true:
>
>
>    - So im ok having other fields on the root side by side with the array
>       “Records”, and also ok to have extra elements inside each array.
>       - FYI: the real use case im trying to validate AWS SQS message (s3
>       trigger) where I will be interested on several fields, but crafted this
>       simpler example just to ask if its possible to force array to be 
> mandatory
>       and with at least 1 element ?
>
> ==========================================================
>
>
>
> --ValidateRecord 1.8.0
>
> Record Reader                           JsonTreeReader
>
> Record Writer                           JsonRecordSetWriter
>
> Record Writer for Invalid Records
>
> Schema Access Strategy                  Use Reader's Schema
>
> Schema Registry                         No value set
>
> Schema Name                             ${schema.name}
>
> Schema Text                             ${avro.schema}
>
> Allow Extra Fields                      true
>
> Strict Type Checking                    true
>
>
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
> "eventVersion" on each ARRAY element
>
> Schema Access Strategy                  Use 'Schema Text' Property
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text
>
>                                         {
>
>                                            "name": "MyName",
>
>                                            "type": "record",
>
>                                            "namespace": "aa.bb.cc",
>
>                                            "fields": [{
>
>                                                  "name": "Records",
>
>                                                  "type": {
>
>                                                     "type": "array",
>
>                                                     "items": {
>
>                                                        "name":
> "Records_record",
>
>                                                        "type": "record",
>
>                                                        "fields": [{
>
>                                                              "name":
> "eventVersion",
>
>                                                              "type":
> "string"
>
>                                                           }
>
>                                                        ]
>
>                                                     }
>
>                                                  }
>
>                                               }
>
>                                            ]
>
>                                         }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
>
>
> --JsonRecordSetWriter 1.8.0
>
> Schema Write Strategy                   Do Not Write Schema
>
> Schema Access Strategy                  Inherit Record Schema
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text                             { "name": "eventVersion", "type":
> "string" }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
> Pretty Print JSON                       true
>
> Suppress Null Values                    Never Suppress
>
> Output Grouping                         Array
>
>
>
> Thanks in advance,
>
> Emanuel Oliveira
>
>
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>
>

Reply via email to