Hi,

I have a Bash application in which I can interactively change directories.

When entering a directory a background process is started to generate
previews for some extensions, say E1 E2 E3 .

(pseudo)code for the background process used to be:

for ext in E1 E2 E3
    fd "\.$ext$" ... --exec generator

new code is:

find ... | grep -P "E1|E2|E3" | parallel generator

The replacement works as expected.

The generation takes 5 seconds to complete from an empty cache and 2
seconds if the previews already existed. Quite often directories are
visited multiple times, within a short time, if the directory
structure is d1/d2/d3, and each of those directories takes 5 seconds
to process, going from d1 to d3 starts generation for 15 seconds, if
the stay in d3 is short and the user goes back to d1, multiple preview
generation would be running for concurrently directories d1 and d2.

I moved to parallel because I believe it can handle that, and I also
wanted to learn the tool for future usage. I thought that:

find ...... | parallel --semaphore -id hash(directory) generator

would make parallel run the generators for a specific directory
sequential, IE: generators(d1), generator(d2), generator(d3) would run
in parallel since they have different ids, and generator(d2)
generator(d1) (triggered when going up the paths) would not run before
the previous generator(d2) and generator(d1) are done. I understand
that parallel is itself running in different processes but it's my
understanding that the semaphores are kept on disk and probably
shared.

Instead nothing is run at all, nothing is visible in the log and no cpu is used.

What am I thinking or doing wrong?

Not having any directories processed twice, within a time period,
would be best and I will add that to my code as I don't think it's
possible to put a timeout on a mutex in gnu parallel that would be
valid after all its arguments are processed.

I'm using GNU parallel 20230722

Reply via email to