Mark Payne created NIFI-6986:
--------------------------------

             Summary: ValidateRecord should optionally validate of nullable 
fields are present
                 Key: NIFI-6986
                 URL: https://issues.apache.org/jira/browse/NIFI-6986
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
            Reporter: Mark Payne


Currently, if a field is nullable according to the schema, ValidateRecord 
considers the record to be valid, even if the field is missing completely. For 
some use cases, this is desirable. For example, it is common to drop fields in 
JSON when the field's value is null, because it can drastically reduce the size 
of the JSON.

However, in other use cases, this is not desirable. For example, in a CSV file, 
we may want to require that there are the appropriate number of fields in a 
Record. It may be acceptable, for instance to have a line like "1234, John 
Smith, , , ," but not to have a line like "1234, John Smith".

ValidateRecord should be updated with a new Property: "Allow Missing Null 
Values". If the value is `true` (the default, to avoid changing behavior 
between versions), the Processor should behave as it does now, where the 
absence of the field is synonymous with a null value. In this case, a line like 
"1234, John Smith" would be valid when the CSV is expecting 6 fields, as long 
as the last 4 fields are nullable.

But if the value of this new property is `false`, the Processor should require 
that all fields be present in the data, even if the field has a null value. In 
this case, a line like "1234, John Smith" would be invalid if the CSV were 
expected to contain 6 fields.

The `WriteJsonResult` class has a method in it: `private boolean 
isFieldPresent(RecordField field, Record record)`. This method should really 
exist on `Record` itself with a slightly different signature: `boolean 
isFieldPresent(RecordField field)`. It should have a default implementation 
provided, akin to the implementation in `WriteJsonResult` and then 
`WriteJsonResult` should simply use that method.

`StandardSchemaValidator` should then be updated to use this to validate that 
records have all required fields, as configured. `SchemaValidationContext` 
should then be updated also to indicate whether or not the presence of null 
values should be validated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to