Parsing 5 million lines of input is taking 4 minutes - too slow!

Roger L Costello Tue, 26 Dec 2023 11:04:46 -0800

Hi Folks,

My input file contains 5 million 132-character records.


I have done everything that I can think of to make the parsing faster:


  1.  I precompiled the schema and used it to do the parsing
  2.  I set Java -Xmx40960m
  3.  I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer

And yet it still takes 4 minutes before the (4 GB) XML file is produced. 
Waiting 4 minutes is not acceptable for my clients.

A couple of questions:


  1.  Is there anything else that I can do to speed things up?
  2.  I believe there is time needed to do the parsing and generate an 
in-memory parse tree, and there is time needed to serialize the in-memory parse 
tree to an XML file. Is there a way to find those two times? I suspect the 
former is a lot quicker than the latter.

/Roger

Parsing 5 million lines of input is taking 4 minutes - too slow!

Reply via email to