I have a PDF rendering program for sheet music running on a Raspberry Pi 3, 
using Poppler 0.51.0 built from source, running in QT5.8 through the QT5 API.

I am seeing some weird threading performance behavior.  I am calling the 
page->renderToImage within a separate thread, or more precisely several of them.

I am not getting any errors, and the results are correct. 

For example, in rendering the same two (and only two) pages in a single thread, 
it takes 5.7 and 5.6 seconds to render, a total of 11.3 seconds.  When rendered 
in parallel it takes 8.6 seconds for the first to complete, and an additional 
50ms +/- for the second, i.e. basically 8.6 seconds total.   

There is no IO that I can see going on at the time, there is no swap file (so 
no swap usage), plenty of memory, and nothing else running except the desktop 
services to display the images.  

That's faster, but not nearly as much faster as I anticipated.

Three at a time gives about 8 seconds for the first, about 1.5 seconds for the 
second, and 0.6 for the third (I say "about" as my 3 page render was different 
content). 

Even though no IO occurs, increasing to 4 I still cannot get the processor busy 
(e.g. as seen by "top"), seeming to imply some constraint beyond cores.

Here's what is more strange.  If I submit 3 pages in a row in order 1, 2, 3 to 
three separate threads (the Pi3 has 4 cores), these always finish in order 3, 
2, 1.  I've instrumented these in as many ways as I can to confirm the sequence 
(and yes, that they really are running in separate threads). That's not a big 
deal program-logic wise, but it is an odd symptom.   That aspect is 
reproducible on a fast HyperV box I use for testing (it processes them fast 
enough that the rendering speed is not terribly meaningful there) - it is 
always in reverse order.  And not all that close (i.e. it's not a stream IO 
issue with the debug output).

Makes me wonder if something is blocking/serialized, forcing the LIFO behavior 
and so perhaps keeping me from getting the most performance.

Are there any special considerations for using Poppler with multi-threaded 
rendering?   Different cmake options for example?   Different calling 
sequences? 

I realize that the Pi3 architecture might be causing this, e.g. memory speed so 
multi-threading is less efficient. I really did not think much about it until I 
realized the renders (started within a millisecond of each other) always finish 
in reverse order of initiation.

Incidentally, I have tried compiles (of poppler as well as my application) with 
both -O2 and -O3 with negligible difference in performance.

Any suggestions or insights would be welcomed.

Linwood Ferguson

_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to