Re: Parsing 5 million lines of input is taking 4 minutes - too slow!

Steve Lawrence Tue, 02 Jan 2024 06:18:33 -0800

You are correct that daffodil builds an internal infoset and thenserializes that to something else (e.g. XML, EXI, JSON). One way to geta decent approximation for how much time is used for the former is touse the "null" infoset outputter, e.g.


  daffodil parse -I null ...

This still parses data and builds the internal infoset but turns infosetserialization into a no-op.


On 2023-12-26 02:04 PM, Roger L Costello wrote:

Hi Folks,

My input file contains 5 million 132-character records.

I have done everything that I can think of to make the parsing faster:

 1. I precompiled the schema and used it to do the parsing
 2. I set Java -Xmx40960m
 3. I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer

And yet it still takes 4 minutes before the (4 GB) XML file is produced.Waiting 4 minutes is not acceptable for my clients.


A couple of questions:

 1. Is there anything else that I can do to speed things up?
 2. I believe there is time needed to do the parsing and generate an
    in-memory parse tree, and there is time needed to serialize the
    in-memory parse tree to an XML file. Is there a way to find those
    two times? I suspect the former is a lot quicker than the latter.

/Roger

Re: Parsing 5 million lines of input is taking 4 minutes - too slow!

Reply via email to