I’m having a lot of difficulty getting complex avro schema’s to work as needed.

 

I have a CovertRecord processor going from CSV to JSON.  This works well.

 

Using the Alexa dataset, I have CSV schema:

 

fqdn,rank

 

output to simple json:

 

[ { “fqdn”:”google.com”, “rank”:”1” }, { “fqdn”:”youtube.com”, “rank”:”2” } , … 
]

 

So far so good.  Then, because of the output format desired, I need to convert 
to a nested XML structure, but only a filtered subset.  So I use the 
QueryRecord processor.

 

The desired output is:

 

<root_tag>

     <array_tag>

          <record_tag>

                   <domain>google.com</domain>

          </record_tag>

           <record_tag>

                   <domain>youtube.com</domain>

          </record_tag>

          .

          .

          .

    </array_tag>

</root_tag>

 

I use a JOLT transform to rename the fqdn field and to nest my array:

 

[

  {

    "operation": "shift",

    "spec": {

      "*": {

        "fqdn": "array_tag[].domain",

      }

    }

  }

]

 

This produces:

 

{ “array_tag”: [ { “domain”:”google.com” }, { “domain”:”youtube.com” } , … ] }

 

How do I get from here to XML using QueryRecord – I want to do an SQL statement 
to limit the number of records returned.

 

I’ve tried to Infer Schema, using the JsonTreeReader and output XML with 
XMLSetWriter and Array Tag Name property.  I can’t find a combination that 
works.  It either writes too many nested structures or reads the entire named 
array as one entire record, and I have no idea what schema it’s inferring.

 

I’ve tried the following AvroSchema instead of Infer Schema:

 

{

  "type" : "record",

  "name" : "MyClass",

  "namespace" : "com.test.avro",

  "fields" : [ {

    "name" : "array_tag",

    "type" : {

      "type" : "array",

      "items" : {

        "type" : "record",

        "name" : "array_tag",

        "fields" : [ {

          "name" : "domain",

          "type" : "string"

        } ]

      }

    }

  } ]

}

 

But this doesn’t appear to be decipherable to the QueryRecord processor and I 
get no output values.

 

What’s the magic Avro schema and XML Writer configuration that can read and 
write as desired?

 

 

 

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to