Avro supplies the alias keyword.

So for the example below the following schema works for the namespace

{
  "type":"record",
  "name":"example",
  "fields": [
    {"name":"attr1","type":"int" },
    {"name":"attr2","type":"int" },
    {"name":"attr3","type":"int" },
    {
      "name":"third",
      "type": {"type":"record","name":"thirdType","fields": [
          {"name":"att1","type":"string","aliases": 
["{urn:us:gov:ic:ism:v2}att1" ] },
          {"name":"att2","type":"string","aliases": 
["{urn:us:gov:ic:ism:v2}att2" ] },
          {"name":"att3","type":"string","aliases": 
["{urn:us:gov:ic:ism:v2}att3" ] }
        ] }
    }
  ]
}

On 9/19/22 17:07, Andrew McDonald wrote:
Somehow the formatting got squished for my `third` level

<root xmlns:ICISM="urn:us:gov:ic:ism:v2" >
  <data attr1="val" attr2="val" attr3="val">
      <third  ICISM:att1="cannot_get_val" ICISM:att2="cannot_get_val"  ICISM:att3="cannot_get_val">
  </data>
<root>

And sorry D. Palmatier, I see you wrote 3rd level but meant 4th level by the example you provided. And I don't know if 4th level is possible.

Regards, Andrew

On 9/19/22 16:56, Andrew McDonald wrote:
Yes, you can get the 3rd level fields, at least with 1.12.1 I have been able to.

The TestXMLReader uses:

https://github.com/apache/nifi/blob/rel/nifi-1.12.1/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/resources/xml/people.xml

With the schema,

https://github.com/apache/nifi/blob/rel/nifi-1.12.1/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/resources/xml/testschema

What I've just found out, like minutes ago, and would like some help is how do deal with name spaced attributes on the 3rd level.

For my situation

<root xmlns:ICISM="urn:us:gov:ic:ism:v2" >
  <data attr1="val" attr2="val" attr3="val">
      <third  ICISM:att1="cannot_get_val" ICISM:att2="cannot_get_val"  ICISM:att3="cannot_get_val">
  </data>
<root>


The namespaced decorated attribute in the third tag is not being populated.  In my test xml, if I remove the namespacing from att{1,2,3) then the Json data is populated.

I do see a people_namespace.xml that is used in the TestXMLRecordReader but that is only for tags.

I'm hoping there is a patch I could apply to 1.12.1 b/c we are bound to this version for a while.

Regards, Andrew


On 8/31/22 12:33, D. Palmatier wrote:
Hello.

I'm trying to query the records within a large, ~15GB, XML file. The format of the file is:

<xmlfeed version="1" generated="2022-08-11 13:00:00">
    <records>
        <record>
            <field1></field1>
            <field2></field2>
        </record>
        <record>
            <field1></field1>
            <field2></field2>
        </record>
    </records>
</xmlfeed>

Unfortunately the records I want to query are at the third level and the XMLReader expects records at the second level.

I don't have any control over the format of the source file. Is there a way I can get to these inner records for my queries without having to load the entire file?

Thank you for your time.
David

Reply via email to