How does it compare to the c++?

On Wed, Dec 11, 2019 at 4:32 PM Greg Keogh <gfke...@gmail.com> wrote:

> Folks, I just went through an performance comparison exercise and I
> thought a summary of the results might be of interest here. A colleague is
> converting some C++ code to C# to see if it's possible to maintain the
> legacy high performance while enjoying the benefits of the managed world.
> The core code reads from 1 to 15 text files line-by-line and parses the
> contents of the lines which may look like these samples:
>
> 83;61;58;18;42;96;24;15;42;39
> a1b1*0.333333333333333a2b1*0.333333333333333a3b1
> a3b1*826;2*93;3*101a19b1*526;2*557;3*518
>
> The input files often contain up to 1 million lines. Each parsed number is
> used to update a cell in a large matrix that is typically hundreds wide or
> high, but might be tens of thousands wide. So you can see that this is
> mainly a CPU and memory intensive task. We know that most of the time is
> taken in the tight loop parsing of millions of numbers out of the input
> lines. I wrote a test harness that simulated the processing in C# and
> discovered the following:
>
>    - Release or Debug build made little difference.
>    - Using compiled Regex slows by a factor of 5.
>    - Using string Split slows by a factor of about 3.
>    - Using Parallel.ForEach slows things slightly.
>    - Using an unmanaged buffer with unsafe unchecked pointers slows
>    things slightly.
>    - The fastest way to parse the lines is with an index loop over the
>    chars in the line string.
>
> In a normal business app you would of course use Regex or string methods
> for parsing because it's clear and maintainable, but in this case where
> every millisecond counts I found that any FCL usage would blow-out the time
> and only a for-loop was viable.
>
> Parallelism is probably useless in this case because the processing on
> each worker thread is just a blink, meaning the threading burden was
> heavier than the processing it carried.
>
> So it turns out that an old-fashioned C-style for-loop to manually parse
> the lines is the fastest by a long-shot. It's fragile of course, but my
> colleague has translated the old well-tested C++ code directly over to C#
> (it's rather ugly). This whole scenario is rather unusual and not very
> applicable to LOB apps, but I thought it was worth posting anyway.
>
> Cheers,
> *Greg Keogh*
>
> [image: image.png]
>
> Regex.Match(es)
> Regex.Match(es) with Parallel Processing (PPL)
> String Split
> String Split with PPL
> For-loop
> For-loop with PPL
> Plain file reads with no parsing (lowest baseline)
>

Reply via email to