We have a lot of the same issues where I work, and our solution is to use ExecuteStreamCommand to pass CSVs off to Python scripts that will read stdin line by line to check to see if the export isn't screwed up. Some of our sources are good and we don't have to do that, but others are minefields in terms of the quality of the upstream data source, and that's the only way we've found where we can predictably handle such things.
On Mon, Jan 6, 2020 at 4:57 PM Shawn Weeks <[email protected]> wrote: > That's the challenge, the values can be null but I want to know the fields > are missing(aka not enough delimiters). I run into a common scenario where > line feeds end up in the data making a short row. Currently the reader just > ignores the fact that there aren't enough delimiters and makes them null. > > On 1/6/20, 3:50 PM, "Matt Burgess" <[email protected]> wrote: > > Shawn, > > Your schema indicates that the fields are optional because of the > "type" : ["null", "string"] , so IIRC they won't be marked as invalid > because they are treated as null (I'm not sure there's a difference in > the code between missing and null fields). > > You can try "type": "string" in ValidateRecord to see if that fixes > it, or there's a "StrNotNullOrEmpty" operator in ValidateCSV. > > Regards, > Matt > > On Mon, Jan 6, 2020 at 4:35 PM Shawn Weeks <[email protected]> > wrote: > > > > I’m trying to validate that a csv file has the number of fields > defined in it’s Avro schema. Consider the following schema and CSVs. I > would like to be able to reject the invalid csv as missing fields. > > > > > > > > { > > > > "type" : "record", > > > > "namespace" : "nifi", > > > > "name" : "nifi", > > > > "fields" : [ > > > > { "name" : "c1" , "type" : ["null", "string"] }, > > > > { "name" : "c2" , "type" : ["null", "string"] }, > > > > { "name" : "c3" , "type" : ["null", "string"] } > > > > ] > > > > } > > > > > > > > Good CSV > > > > c1,c2,c3 > > > > hello,world,1 > > > > hello,world, > > > > hello,, > > > > > > > > Bad CSV > > > > c1,c2,c3 > > > > hello,world,1 > > > > hello,world > > > > hello > > > > > > >
