Folks, another heads-up ... I had some old hobby code that would scan
folders for image files, extract the details of the file and image and
update a SQLite DB to create an "index" of all my images. I haven't run the
utility for a few years, but now I have about 4000 images and running it
afresh was taking at least 30 minutes to read all the files. It's slow
because I have to read all the file bytes to load the Image and take an MD5
hash of newly added files (it's much faster on subsequent scans when it
knows most files haven't changed).

 

Coincidentally, the latest MSDN magazine has an article titled The Past,
Present and Future of Parallelizing .NET Applications
<http://msdn.microsoft.com/en-us/magazine/hh335070.aspx>  which reminded me
of the System.Threading.Tasks.Parallel class which has many For and ForEach
methods.

 

To use the Parallel class properly you have to discipline yourself to do two
things: (1) Make sure the method that does the "work" is IEnumerable (2) Be
thread safe by lock[ing] whatever the work method updates (obviously).

 

Thing 1 is the important because you must adjust your coding style to make
sure heavyweight methods are IEnumerable. Once you do that you can just go
Parallel.ForEach on the method and bingo it just works and it magically runs
parallelised.

 

My image scan now takes about 10 minutes and in Task Manager I can see all 6
CPUs pumping electrons.

 

There are apparently simple techniques for cancelling parallelised work, but
I haven't tried that yet.

 

Cheers

Greg

 

 

Reply via email to