On Thu, Aug 3, 2023 at 5:11 PM nadim khemir <nadim.khe...@gmail.com> wrote:
> When entering a directory a background process is started to generate > previews for some extensions, say E1 E2 E3 . : > new code is: > > find ... | grep -P "E1|E2|E3" | parallel generator > > The replacement works as expected. > > The generation takes 5 seconds to complete from an empty cache and 2 > seconds if the previews already existed. Quite often directories are > visited multiple times, within a short time, if the directory > structure is d1/d2/d3, and each of those directories takes 5 seconds > to process, going from d1 to d3 starts generation for 15 seconds, if > the stay in d3 is short and the user goes back to d1, multiple preview > generation would be running for concurrently directories d1 and d2. > > I moved to parallel because I believe it can handle that, and I also > wanted to learn the tool for future usage. I thought that: > > find ...... | parallel --semaphore -id hash(directory) generator I really cannot blame you for thinking that would work. But as you discovered it does not. parallel --semaphore (or short: sem) puts GNU Parallel in semaphore mode which is somewhat different from the normal mode. It might be easier for you to think of 'sem' as a completely separate command. It has been designed to run a single command. That single command could be GNU Parallel in normal mode. If your command reads stdin, you need to tell sem to forward stdin by using --pipe, so this should work for you: find my/dir | sem --id my/dir --pipe parallel generator > would make parallel run the generators for a specific directory > sequential, IE: generators(d1), generator(d2), generator(d3) would run > in parallel since they have different ids, and generator(d2) > generator(d1) (triggered when going up the paths) would not run before > the previous generator(d2) and generator(d1) are done. I understand > that parallel is itself running in different processes but it's my > understanding that the semaphores are kept on disk and probably > shared. They are indeed shared (currently in ~/.parallel - but may be moved to SHM in the future; not that you should care, as the interface will not be changed). I understand you want: process my/dir to block: process my/dir/sub from starting, because "process my/dir" would also process my/dir/sub, but it should not block "process my/other/dir". You can do that by: find .. | sem --id my/dir/sub sem --id my/dir --pipe parallel generator Notice this will run sem inside sem and thus blocking 2 ids. You would need to generate these ids yourself, so my/sub/sub/sub/dir starts 5 sems. Currently sem does not in itself support multiple "--id"s but it could do that in the future. Maybe using a syntax like: find .. | sem --id my/dir/sub --id my/dir --pipe parallel generator # This does not work yet find .. | sem --id "my/dir/sub my/dir" --pipe parallel generator # This does not work yet find .. | sem --id "my/dir/sub,my/dir" --pipe parallel generator # This does not work yet Maybe we should even support hierarchical ids, which would fit your purpose exactly: find .. | sem --idhier "my/dir/sub" --pipe parallel generator # This does not work yet find .. | sem --idhier "my/dir" --pipe parallel generator # This does not work yet find .. | sem --idhier "my/other/sub" --pipe parallel generator # This does not work yet my/dir would block my/dir/sub my/dir/sub would block my/dir my/dir/sub would not block my/dir/other my/dir would not block my/other/sub Or generally: id = --id for existing in existing_IDs: if id start with existing or existing starts with id: block /Ole