What about ValidateCsv, could that do what you want? Sent from my iPhone
> On Jan 6, 2020, at 6:10 PM, Shawn Weeks <[email protected]> wrote: > > > I’m poking around to see if I can make the csv parsers fail on a schema > mismatch like that. A stream command would be a good option though. > > Thanks > Shawn > > From: Mike Thomsen <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Monday, January 6, 2020 at 4:35 PM > To: "[email protected]" <[email protected]> > Subject: Re: Validating CSV File > > We have a lot of the same issues where I work, and our solution is to use > ExecuteStreamCommand to pass CSVs off to Python scripts that will read stdin > line by line to check to see if the export isn't screwed up. Some of our > sources are good and we don't have to do that, but others are minefields in > terms of the quality of the upstream data source, and that's the only way > we've found where we can predictably handle such things. > > On Mon, Jan 6, 2020 at 4:57 PM Shawn Weeks <[email protected]> wrote: > That's the challenge, the values can be null but I want to know the fields > are missing(aka not enough delimiters). I run into a common scenario where > line feeds end up in the data making a short row. Currently the reader just > ignores the fact that there aren't enough delimiters and makes them null. > > On 1/6/20, 3:50 PM, "Matt Burgess" <[email protected]> wrote: > > Shawn, > > Your schema indicates that the fields are optional because of the > "type" : ["null", "string"] , so IIRC they won't be marked as invalid > because they are treated as null (I'm not sure there's a difference in > the code between missing and null fields). > > You can try "type": "string" in ValidateRecord to see if that fixes > it, or there's a "StrNotNullOrEmpty" operator in ValidateCSV. > > Regards, > Matt > > On Mon, Jan 6, 2020 at 4:35 PM Shawn Weeks <[email protected]> > wrote: > > > > I’m trying to validate that a csv file has the number of fields defined > in it’s Avro schema. Consider the following schema and CSVs. I would like to > be able to reject the invalid csv as missing fields. > > > > > > > > { > > > > "type" : "record", > > > > "namespace" : "nifi", > > > > "name" : "nifi", > > > > "fields" : [ > > > > { "name" : "c1" , "type" : ["null", "string"] }, > > > > { "name" : "c2" , "type" : ["null", "string"] }, > > > > { "name" : "c3" , "type" : ["null", "string"] } > > > > ] > > > > } > > > > > > > > Good CSV > > > > c1,c2,c3 > > > > hello,world,1 > > > > hello,world, > > > > hello,, > > > > > > > > Bad CSV > > > > c1,c2,c3 > > > > hello,world,1 > > > > hello,world > > > > hello > > > > >
