2) Is it better to do the concurrency at the highest level possible or
at the lowest level possible. I.e. should I be processing pages
concurrently or should I go to a much lower level and only be
processing letters concurrently.  Does it matter?

3)  How does hyperthreading affect the number of places or futures I
can run concurrently? For example if I have an i7 with 4 cores and
hyperthreading, will that run 4 or 8 places concurrently?


actually, a lot of it depends on two things: a) whether your computations are I/O or CPU bound (in the case of OCR, I assume, they are CPU bound) and b) whether you benefit at all from any "short" computation being completed prior to when it would have been its turn in "linear" computation.

If all your threads are CPU bound and your task isn't completed before the last bit of it is completed - all the concurrency you can get ouf of it is the parallelism within several processors (and of course both the operating system and language support of using those processors efficiently). In the "worst case," you only have one processor and there isn't a point in doing anything before your entire document is processed - if that's the case, parallelizing your computations actually is your *worst* option because the total number of CPU cycles needed to do the whole jobs doesn't change regardless of how you split up the compuations, but in the parallel version, there is context switching overhead! In that scenario, you might as well (even better) launch one batch file to do all pieces in turn, go to sleep and come back next morning to see the task done.

If, however, there are a lot of "gaps" a particular task has - meaning that an I/O system or a coprocessor relieve(s) the CPU(s) periodically - concurrent programming can help you utilize the CPU(s) a 100% while filling in the gaps with other computations, thus "squeezing together" the computations.

Yet another story is when you can benefit from individual computations being computed as fast as possible. Assume, for example, that some GUI frontend can offer to display an individual page as soon as it (the page) was processed completly for the user to scrutinize - in that case, needless to say, you don't want a 100M document to clog up the CPU while a dozen very small documents would be ready in no time flat so the person in front of the GUI could already start working on the small documents. In that case, parallelizing would make sense even if every single task would take up 100% CPU cycles (thus paying context swith overhead) because the small computations are perceived to be completed faster. So the choice isn't black and white but instead depends very strongly on the task set.

I hope I didn't state the obvious; all of the above (aside of course from not being specific to racket at all) describes fairly elementary concurrent application design. Apologies if I should have assessed you on too elementary a level.

4) Are there any "gotcha's" I need to look out for?


I don't know if anyone has computed the costs of concurrency in Racket yet - I assume that whereever possible (eg on Windows where fibers are available) the concurrency layers have been tailored to suit whatever the underlying hard- and software offers, but be aware that concurrency never comes free. So do run some elementary tests on the cost of concurrency - eg, (time) both single threaded and multithreaded like computations and see what the overhead is before you dive into concurrent programming. Also (and this, too, is rather elementary, and again, please don't take it as an insult if I state what may sound trivial) keep in mind that concurrent software desing is one of the outer space limits of computation still - one of the things that won't work if you don't do it good enough but likewise won't work if you do it too well. Synchronization problems are among the worst to debug and trace because they are rarely reproducable and tend to manifest themselves differently each time they occur.


Thanks,
Harry
____________________
 Racket Users list:
 http://lists.racket-lang.org/users


____________________
 Racket Users list:
 http://lists.racket-lang.org/users

Reply via email to