Hi Mickael,

I think a job per recursive step is too fine-grained here. I would suggest 
something like this:

- A single thread-safe queue instance of directories to process
- Each work pulls a directory from the queue, processes that directory, 
and inserts children into the queue.
- Workers continue until queue is empty.
- Workers would need to hold queue mutex until children are added to avoid 
premature worker death (lock queue, pop dir, push children, unlock queue, 
process dir)
- Have a fixed size pool of workers processing the queue, tuned based on 
performance measurements to see what is optimal size (probably near to 
number of cores on the machine)

John




From:   Mickael Istria <[email protected]>
To:     [email protected]
Date:   04/20/2015 11:17 AM
Subject:        [platform-dev] How to best use JobGroups in a recursive 
task?
Sent by:        [email protected]



Hi all,

In the context of https://wiki.eclipse.org/E4/UI/Smart_Import , the main 
operation that crawls the filesystem to identify patterns and then 
configure Eclipse projects.
I believe that, when on a directory, crawling and configuring direct 
children can be treated in parallel. I was thinking of using JobGroups to 
enable parallelism for this task, and more specifically, I was thinking 
that I could simply make this loop parallel: 
http://git.eclipse.org/c/e4/org.eclipse.e4.ui.git/tree/bundles/org.eclipse.e4.ui.importer/src/org/eclipse/ui/internal/wizards/datatransfer/OpenFolderCommand.java#n158
 
. It's the one that iterates on children.
However, since it's in a recursive operation, I don't believe I can assign 
1 job for each iteration, or it will assign all thread to "parent" tasks, 
without leaving some for children. Example:

Imagine this filesystem structure:
* P
** P1
*** P1a
*** P1b
** P2
*** P2a
*** P2b
** P3
*** P3a
....
And that I use 2 threads to throttle my jobs. Then those 2 threads will be 
assigned to the job responsible of crawling P1 and P2, and when trying to 
work on P3, P1a, P1b, P2a or P2b, the new job for this branch will wait 
for a new thread forever.
I'm not a parallelism expert, so that may be a dumb question, but how can 
I take benefit from throttling in a recursive algorithm?
A naive implementation would be to only parallelize the execution for the 
1st-level branches, so that P1 // P2 // P3, and keep the subdirectories on 
the same thread, but this is obviously sub-optimal in case P1 and P2 
finish quite fast, sub-jobs of P3 should be allowed to use the additional 
thread.

Cheers,
-- 
Mickael Istria
Eclipse developer at JBoss, by Red Hat
My blog - My Tweets_______________________________________________
platform-dev mailing list
[email protected]
To change your delivery options, retrieve your password, or unsubscribe 
from this list, visit
https://dev.eclipse.org/mailman/listinfo/platform-dev
_______________________________________________
platform-dev mailing list
[email protected]
To change your delivery options, retrieve your password, or unsubscribe from 
this list, visit
https://dev.eclipse.org/mailman/listinfo/platform-dev

Reply via email to