I’ve read that Streaming API for XML (STAX) is good for this sort of thing, but haven't tried it myself.
* https://en.wikipedia.org/wiki/StAX Recommended for compactness and high-performance decompression of XML into memory: EXI. * Nagasena OpenEXI, https://openexi.sourceforge.net * Exificient, https://exificient.github.io I have often thought that someone implementing EXI together with XSLT would be a powerful high-performance combination. all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutz...@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, Navy robotics https://faculty.nps.edu/brutzman ________________________________ From: Roger L Costello <coste...@mitre.org> Sent: Tuesday, December 26, 2023 11:04:29 AM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: Parsing 5 million lines of input is taking 4 minutes - too slow! NPS WARNING: *external sender* verify before acting. Hi Folks, My input file contains 5 million 132-character records. I have done everything that I can think of to make the parsing faster: 1. I precompiled the schema and used it to do the parsing 2. I set Java -Xmx40960m 3. I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer And yet it still takes 4 minutes before the (4 GB) XML file is produced. Waiting 4 minutes is not acceptable for my clients. A couple of questions: 1. Is there anything else that I can do to speed things up? 2. I believe there is time needed to do the parsing and generate an in-memory parse tree, and there is time needed to serialize the in-memory parse tree to an XML file. Is there a way to find those two times? I suspect the former is a lot quicker than the latter. /Roger