You are correct that daffodil builds an internal infoset and then
serializes that to something else (e.g. XML, EXI, JSON). One way to get
a decent approximation for how much time is used for the former is to
use the "null" infoset outputter, e.g.
daffodil parse -I null ...
This still parses data and builds the internal infoset but turns infoset
serialization into a no-op.
On 2023-12-26 02:04 PM, Roger L Costello wrote:
Hi Folks,
My input file contains 5 million 132-character records.
I have done everything that I can think of to make the parsing faster:
1. I precompiled the schema and used it to do the parsing
2. I set Java -Xmx40960m
3. I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer
And yet it still takes 4 minutes before the (4 GB) XML file is produced.
Waiting 4 minutes is not acceptable for my clients.
A couple of questions:
1. Is there anything else that I can do to speed things up?
2. I believe there is time needed to do the parsing and generate an
in-memory parse tree, and there is time needed to serialize the
in-memory parse tree to an XML file. Is there a way to find those
two times? I suspect the former is a lot quicker than the latter.
/Roger