I’m having a lot of difficulty getting complex avro schema’s to work as needed.
I have a CovertRecord processor going from CSV to JSON. This works well.
Using the Alexa dataset, I have CSV schema:
fqdn,rank
output to simple json:
[ { “fqdn”:”google.com”, “rank”:”1” }, { “fqdn”:”youtube.com”, “rank”:”2” } , …
]
So far so good. Then, because of the output format desired, I need to convert
to a nested XML structure, but only a filtered subset. So I use the
QueryRecord processor.
The desired output is:
<root_tag>
<array_tag>
<record_tag>
<domain>google.com</domain>
</record_tag>
<record_tag>
<domain>youtube.com</domain>
</record_tag>
.
.
.
</array_tag>
</root_tag>
I use a JOLT transform to rename the fqdn field and to nest my array:
[
{
"operation": "shift",
"spec": {
"*": {
"fqdn": "array_tag[].domain",
}
}
}
]
This produces:
{ “array_tag”: [ { “domain”:”google.com” }, { “domain”:”youtube.com” } , … ] }
How do I get from here to XML using QueryRecord – I want to do an SQL statement
to limit the number of records returned.
I’ve tried to Infer Schema, using the JsonTreeReader and output XML with
XMLSetWriter and Array Tag Name property. I can’t find a combination that
works. It either writes too many nested structures or reads the entire named
array as one entire record, and I have no idea what schema it’s inferring.
I’ve tried the following AvroSchema instead of Infer Schema:
{
"type" : "record",
"name" : "MyClass",
"namespace" : "com.test.avro",
"fields" : [ {
"name" : "array_tag",
"type" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "array_tag",
"fields" : [ {
"name" : "domain",
"type" : "string"
} ]
}
}
} ]
}
But this doesn’t appear to be decipherable to the QueryRecord processor and I
get no output values.
What’s the magic Avro schema and XML Writer configuration that can read and
write as desired?
smime.p7s
Description: S/MIME cryptographic signature
