You are correct that daffodil builds an internal infoset and then serializes that to something else (e.g. XML, EXI, JSON). One way to get a decent approximation for how much time is used for the former is to use the "null" infoset outputter, e.g.

  daffodil parse -I null ...

This still parses data and builds the internal infoset but turns infoset serialization into a no-op.

On 2023-12-26 02:04 PM, Roger L Costello wrote:
Hi Folks,

My input file contains 5 million 132-character records.

I have done everything that I can think of to make the parsing faster:

 1. I precompiled the schema and used it to do the parsing
 2. I set Java -Xmx40960m
 3. I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer

And yet it still takes 4 minutes before the (4 GB) XML file is produced. Waiting 4 minutes is not acceptable for my clients.

A couple of questions:

 1. Is there anything else that I can do to speed things up?
 2. I believe there is time needed to do the parsing and generate an
    in-memory parse tree, and there is time needed to serialize the
    in-memory parse tree to an XML file. Is there a way to find those
    two times? I suspect the former is a lot quicker than the latter.

/Roger


Reply via email to