If you want to see how long it takes to parse a single file, you can add
the -v flag to the normal parse/unparse command and it will output time
to compile the schema and time to parse. For example:

  $ daffodil -v parse -s foo.dfdl.xsd foo.dat
  [info] Time (compiling): 1631ms
  <?xml version="1.0" encoding="UTF-8"?>
  <foo>bar<foo>
  [info] Time (parsing): 59ms

Note that the "Time (parsing)" value includes reading the file since
daffodil reads it in as a stream, so a slow hard drive could affect this
number. Also note that Java take some time to just-in-time compile and
optimize the Java bytecode, so a single parse is not going to be
representative of the fasted possible speed, compared to once Java is
warmed up.

For these reasons, we've added the "performance" subcommand [1], which
tries to mitigate some of these issues, and can be run like this:

  daffodil performance -s foo.dfdl.xsd -N 100 foo.dat

This will read foo.dat into memory to avoid any overhead cause by
reading from a harddrive. Then it will parse that data 100 times (or
whatever you set -N to) record the time for each individual parse, and
then output some stats, something like this:

  total parse time (sec): 0.159370
  min rate (files/sec): 13.985179
  max rate (files/sec): 167.186282
  avg rate (files/sec): 62.746911

Notice how there is a big difference between the min rate (the slowest
individual parse) and the max rate (the fastest individual parse). This
is because of the just-in-time compilation and optimization that Java
does during the first number of parses. To get an accurate number of the
fastest daffodil can parse (hardware dependent of course), then I
usually bump up the -N option until max rate stops increasing. This
allows Java to finish all the compilation/optimizattion.

You can also add "--unparse" to the performance command to test
unparsing (the data needs to be an XML file). And you can also use the
"--threads" option to increase the number of threads if your interested
how threading improves things. Also the "-v" option mentioned at the
time will also show all the individual times if your interested in that.

- Steve

[1] https://daffodil.apache.org/cli/#performance-subcommand

On 8/19/21 5:58 PM, Roger L Costello wrote:
> Hi Folks,
> 
> I want to measure the time it takes Daffodil to read in the input file, parse 
> it, and write the XML to a file. How do I do that from the command line?
> 
> Something like this, I imagine:
> 
>       daffodil.bat -performance ???
> 
> /Roger
> 

Reply via email to