How does it compare to the c++? On Wed, Dec 11, 2019 at 4:32 PM Greg Keogh <gfke...@gmail.com> wrote:
> Folks, I just went through an performance comparison exercise and I > thought a summary of the results might be of interest here. A colleague is > converting some C++ code to C# to see if it's possible to maintain the > legacy high performance while enjoying the benefits of the managed world. > The core code reads from 1 to 15 text files line-by-line and parses the > contents of the lines which may look like these samples: > > 83;61;58;18;42;96;24;15;42;39 > a1b1*0.333333333333333a2b1*0.333333333333333a3b1 > a3b1*826;2*93;3*101a19b1*526;2*557;3*518 > > The input files often contain up to 1 million lines. Each parsed number is > used to update a cell in a large matrix that is typically hundreds wide or > high, but might be tens of thousands wide. So you can see that this is > mainly a CPU and memory intensive task. We know that most of the time is > taken in the tight loop parsing of millions of numbers out of the input > lines. I wrote a test harness that simulated the processing in C# and > discovered the following: > > - Release or Debug build made little difference. > - Using compiled Regex slows by a factor of 5. > - Using string Split slows by a factor of about 3. > - Using Parallel.ForEach slows things slightly. > - Using an unmanaged buffer with unsafe unchecked pointers slows > things slightly. > - The fastest way to parse the lines is with an index loop over the > chars in the line string. > > In a normal business app you would of course use Regex or string methods > for parsing because it's clear and maintainable, but in this case where > every millisecond counts I found that any FCL usage would blow-out the time > and only a for-loop was viable. > > Parallelism is probably useless in this case because the processing on > each worker thread is just a blink, meaning the threading burden was > heavier than the processing it carried. > > So it turns out that an old-fashioned C-style for-loop to manually parse > the lines is the fastest by a long-shot. It's fragile of course, but my > colleague has translated the old well-tested C++ code directly over to C# > (it's rather ugly). This whole scenario is rather unusual and not very > applicable to LOB apps, but I thought it was worth posting anyway. > > Cheers, > *Greg Keogh* > > [image: image.png] > > Regex.Match(es) > Regex.Match(es) with Parallel Processing (PPL) > String Split > String Split with PPL > For-loop > For-loop with PPL > Plain file reads with no parsing (lowest baseline) >