Steve wrote:
Ø One way to get a decent approximation for how much time is used for the former [build an internal infoset] is to use the "null" infoset outputter, e.g. Ø Ø daffodil parse -I null ... Ø This still parses data and builds the internal infoset but turns infoset * serialization into a no-op. Thanks Steve. I did as you suggested and here’s the result: * 130 seconds That is super surprising. I would have expected it to take much, much less time. So, that means it takes 130 seconds to parse the 5-million-line input file and build an internal infoset but only 96 seconds to create the 4 GB XML file. That makes no sense. /Roger From: Steve Lawrence <slawre...@apache.org> Sent: Tuesday, January 2, 2024 9:18 AM To: users@daffodil.apache.org Subject: [EXT] Re: Parsing 5 million lines of input is taking 4 minutes - too slow! You are correct that daffodil builds an internal infoset and then serializes that to something else (e. g. XML, EXI, JSON). One way to get a decent approximation for how much time is used for the former is to use the "null" infoset outputter, ZjQcmQRYFpfptBannerStart You are correct that daffodil builds an internal infoset and then serializes that to something else (e.g. XML, EXI, JSON). One way to get a decent approximation for how much time is used for the former is to use the "null" infoset outputter, e.g. daffodil parse -I null ... This still parses data and builds the internal infoset but turns infoset serialization into a no-op. On 2023-12-26 02:04 PM, Roger L Costello wrote: > Hi Folks, > > My input file contains 5 million 132-character records. > > I have done everything that I can think of to make the parsing faster: > > 1. I precompiled the schema and used it to do the parsing > 2. I set Java -Xmx40960m > 3. I used a bunch of dfdl:choiceDispatchKey to divide-and-conquer > > And yet it still takes 4 minutes before the (4 GB) XML file is produced. > Waiting 4 minutes is not acceptable for my clients. > > A couple of questions: > > 1. Is there anything else that I can do to speed things up? > 2. I believe there is time needed to do the parsing and generate an > in-memory parse tree, and there is time needed to serialize the > in-memory parse tree to an XML file. Is there a way to find those > two times? I suspect the former is a lot quicker than the latter. > > /Roger >