Somehow the formatting got squished for my `third` level

<root xmlns:ICISM="urn:us:gov:ic:ism:v2" >
  <data attr1="val" attr2="val" attr3="val">
      <third  ICISM:att1="cannot_get_val" ICISM:att2="cannot_get_val"  ICISM:att3="cannot_get_val">
  </data>
<root>

And sorry D. Palmatier, I see you wrote 3rd level but meant 4th level by the example you provided. And I don't know if 4th level is possible.

Regards, Andrew

On 9/19/22 16:56, Andrew McDonald wrote:
Yes, you can get the 3rd level fields, at least with 1.12.1 I have been able to.

The TestXMLReader uses:

https://github.com/apache/nifi/blob/rel/nifi-1.12.1/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/resources/xml/people.xml

With the schema,

https://github.com/apache/nifi/blob/rel/nifi-1.12.1/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/resources/xml/testschema

What I've just found out, like minutes ago, and would like some help is how do deal with name spaced attributes on the 3rd level.

For my situation

<root xmlns:ICISM="urn:us:gov:ic:ism:v2" >
  <data attr1="val" attr2="val" attr3="val">
      <third  ICISM:att1="cannot_get_val" ICISM:att2="cannot_get_val"  ICISM:att3="cannot_get_val">
  </data>
<root>


The namespaced decorated attribute in the third tag is not being populated.  In my test xml, if I remove the namespacing from att{1,2,3) then the Json data is populated.

I do see a people_namespace.xml that is used in the TestXMLRecordReader but that is only for tags.

I'm hoping there is a patch I could apply to 1.12.1 b/c we are bound to this version for a while.

Regards, Andrew


On 8/31/22 12:33, D. Palmatier wrote:
Hello.

I'm trying to query the records within a large, ~15GB, XML file. The format of the file is:

<xmlfeed version="1" generated="2022-08-11 13:00:00">
    <records>
        <record>
            <field1></field1>
            <field2></field2>
        </record>
        <record>
            <field1></field1>
            <field2></field2>
        </record>
    </records>
</xmlfeed>

Unfortunately the records I want to query are at the third level and the XMLReader expects records at the second level.

I don't have any control over the format of the source file. Is there a way I can get to these inner records for my queries without having to load the entire file?

Thank you for your time.
David

Reply via email to