I’ve read that Streaming API for XML (STAX) is good for this sort of thing, but 
haven't tried it myself.

  *   https://en.wikipedia.org/wiki/StAX

Recommended for compactness and high-performance decompression of XML into 
memory:  EXI.

  *   Nagasena OpenEXI, https://openexi.sourceforge.net
  *   Exificient, https://exificient.github.io

I have often thought that someone implementing EXI together with XSLT would be 
a powerful high-performance combination.


all the best, Don

--

Don Brutzman  Naval Postgraduate School, Code USW/Br        brutz...@nps.edu

Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA    +1.831.656.2149

X3D graphics, virtual worlds, Navy robotics https://faculty.nps.edu/brutzman
________________________________
From: Roger L Costello <coste...@mitre.org>
Sent: Tuesday, December 26, 2023 11:04:29 AM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Parsing 5 million lines of input is taking 4 minutes - too slow!


NPS WARNING: *external sender* verify before acting.


Hi Folks,



My input file contains 5 million 132-character records.



I have done everything that I can think of to make the parsing faster:



  1.  I precompiled the schema and used it to do the parsing
  2.  I set Java -Xmx40960m
  3.  I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer



And yet it still takes 4 minutes before the (4 GB) XML file is produced. 
Waiting 4 minutes is not acceptable for my clients.



A couple of questions:



  1.  Is there anything else that I can do to speed things up?
  2.  I believe there is time needed to do the parsing and generate an 
in-memory parse tree, and there is time needed to serialize the in-memory parse 
tree to an XML file. Is there a way to find those two times? I suspect the 
former is a lot quicker than the latter.



/Roger

Reply via email to