Re: How to run performance tests?

Steve Lawrence Fri, 20 Aug 2021 04:41:20 -0700

An additional clarification, the times in the performance command do not
include reading from disk, but do include reading from the in-memory
storage.


When testing parse performance, the times includes both parsing and
outputting the infoset to XML. Different infoset types will have
different overheads, so if you're curious about just the time to parse
and not about the infoset output, you can use the "-I null" option (same
as with the "parse" subcommand). For example

  daffodil performance -s foo.dfdl.xsd -N 10 -I null foo.dat

When testing unparse performance, the time includes both reading the
infoset from memory and writing the unparsed data. Note that although we
"write" all the unparsed data, we don't actually store it in memory or
write it to a file, it's essentially writing to /dev/null, so the
overhead of writing data is minimized, since that can be very hardware
dependent.

On 8/20/21 7:22 AM, Steve Lawrence wrote:
> If you want to see how long it takes to parse a single file, you can add
> the -v flag to the normal parse/unparse command and it will output time
> to compile the schema and time to parse. For example:
> 
>   $ daffodil -v parse -s foo.dfdl.xsd foo.dat
>   [info] Time (compiling): 1631ms
>   <?xml version="1.0" encoding="UTF-8"?>
>   <foo>bar<foo>
>   [info] Time (parsing): 59ms
> 
> Note that the "Time (parsing)" value includes reading the file since
> daffodil reads it in as a stream, so a slow hard drive could affect this
> number. Also note that Java take some time to just-in-time compile and
> optimize the Java bytecode, so a single parse is not going to be
> representative of the fasted possible speed, compared to once Java is
> warmed up.
> 
> For these reasons, we've added the "performance" subcommand [1], which
> tries to mitigate some of these issues, and can be run like this:
> 
>   daffodil performance -s foo.dfdl.xsd -N 100 foo.dat
> 
> This will read foo.dat into memory to avoid any overhead cause by
> reading from a harddrive. Then it will parse that data 100 times (or
> whatever you set -N to) record the time for each individual parse, and
> then output some stats, something like this:
> 
>   total parse time (sec): 0.159370
>   min rate (files/sec): 13.985179
>   max rate (files/sec): 167.186282
>   avg rate (files/sec): 62.746911
> 
> Notice how there is a big difference between the min rate (the slowest
> individual parse) and the max rate (the fastest individual parse). This
> is because of the just-in-time compilation and optimization that Java
> does during the first number of parses. To get an accurate number of the
> fastest daffodil can parse (hardware dependent of course), then I
> usually bump up the -N option until max rate stops increasing. This
> allows Java to finish all the compilation/optimizattion.
> 
> You can also add "--unparse" to the performance command to test
> unparsing (the data needs to be an XML file). And you can also use the
> "--threads" option to increase the number of threads if your interested
> how threading improves things. Also the "-v" option mentioned at the
> time will also show all the individual times if your interested in that.
> 
> - Steve
> 
> [1] https://daffodil.apache.org/cli/#performance-subcommand
> 
> On 8/19/21 5:58 PM, Roger L Costello wrote:
>> Hi Folks,
>>
>> I want to measure the time it takes Daffodil to read in the input file, 
>> parse it, and write the XML to a file. How do I do that from the command 
>> line?
>>
>> Something like this, I imagine:
>>
>>              daffodil.bat -performance ???
>>
>> /Roger
>>
>

Re: How to run performance tests?

Reply via email to