Bas, Sorry for the late reply, I should've mentioned sooner that I am looking into this issue. From your description it seems like ConvertJSONtoAvro should be able to handle this kind of thing; if I can't find a schema that fits and instead confirm it is a bug/improvement, I will write up a Jira and inform this list either way. Thank you for your question, IMO this is indeed a valid use case that should be supported.
Regards, Matt On Tue, Jan 31, 2017 at 9:10 AM, Bas van Kortenhof <bas.vankorten...@sanoma.com> wrote: > Hi all, > > Not completely sure if this is a developer or user question, but I'm posting > it here for now as at this moment it is related to flow design. > > So what I'm trying to achieve is to get a JSON response from an API, extract > the relevant values, validate this data and convert it to avro. I am able to > complete the first two steps with InvokeHTTP and JoltTransformJSON, after > which my data is an array of objects in JSON, so my flowfile looks like > this: > > [ > {"key1": "val1", "key2": "val2"}, > {"key1": "val3", "key2": "val4"} > ] > > My idea was now to put this JSON in a ConvertJSONToAvro together with the > appropriate avro schema. However, ConvertJSONToAvro cannot apply schema > validation on the individual elements of an array. It can, however, apply > schema validation to records that are not contained in an array but are > separated by newlines, so it can handle the following flowfile (note that > this, on a file level, is basically invalid JSON): > > {"key1": "val1", "key2": "val2"} > {"key1": "val3", "key2": "val4"} > > I can achieve this in NiFi by splitting the JSON flowfile with SplitJSON and > merging it back together immediately with a MergeContent processor with '\n' > as demarcator. These both have to be applied before the ConvertJSONToAvro, > because otherwise invalid records would cause the merge step to fail. So > this splitting can't even be used to redistribute files in a cluster > setting, so I don't really like this workaround. > > I was wondering if anyone knows a way to produce the second example format > of JSON using a JOLT transformation, which would be an elegant fix. If not, > I'd like to ask if there is a reason that ConvertJSONToAvro can only handle > newline separated objects and not objects in an array (which is the closest > representation in JSON of the concept of records in Avro in my opinion). If > no such reason I think it can be considered a bug and then I would like to > propose to provide an option in the ConvertJSONToAvro processor to apply the > schema validation on the whole file, on objects separated by newlines or on > objects in an array. > > Please let me know what you think! > > Regards, > Bas > > > > -- > View this message in context: > http://apache-nifi-users-list.2361937.n4.nabble.com/Validating-an-array-of-objects-using-ConvertJSONToAvro-tp832.html > Sent from the Apache NiFi Users List mailing list archive at Nabble.com.