On Thu, Aug 3, 2023 at 5:11 PM nadim khemir <nadim.khe...@gmail.com> wrote:

> When entering a directory a background process is started to generate
> previews for some extensions, say E1 E2 E3 .
:
> new code is:
>
> find ... | grep -P "E1|E2|E3" | parallel generator
>
> The replacement works as expected.
>
> The generation takes 5 seconds to complete from an empty cache and 2
> seconds if the previews already existed. Quite often directories are
> visited multiple times, within a short time, if the directory
> structure is d1/d2/d3, and each of those directories takes 5 seconds
> to process, going from d1 to d3 starts generation for 15 seconds, if
> the stay in d3 is short and the user goes back to d1, multiple preview
> generation would be running for concurrently directories d1 and d2.
>
> I moved to parallel because I believe it can handle that, and I also
> wanted to learn the tool for future usage. I thought that:
>
> find ...... | parallel --semaphore -id hash(directory) generator

I really cannot blame you for thinking that would work. But as you
discovered it does not.

parallel --semaphore (or short: sem) puts GNU Parallel in semaphore
mode which is somewhat different from the normal mode. It might be
easier for you to think of 'sem' as a completely separate command. It
has been designed to run a single command.

That single command could be GNU Parallel in normal mode.

If your command reads stdin, you need to tell sem to forward stdin by
using --pipe, so this should work for you:

  find my/dir | sem --id my/dir --pipe parallel generator

> would make parallel run the generators for a specific directory
> sequential, IE: generators(d1), generator(d2), generator(d3) would run
> in parallel since they have different ids, and generator(d2)
> generator(d1) (triggered when going up the paths) would not run before
> the previous generator(d2) and generator(d1) are done. I understand
> that parallel is itself running in different processes but it's my
> understanding that the semaphores are kept on disk and probably
> shared.

They are indeed shared (currently in ~/.parallel - but may be moved to
SHM in the future; not that you should care, as the interface will not
be changed).

I understand you want:

   process my/dir

to block:

   process my/dir/sub

from starting, because "process my/dir" would also process my/dir/sub,
but it should not block "process my/other/dir".

You can do that by:

  find .. | sem --id my/dir/sub sem --id my/dir --pipe parallel generator

Notice this will run sem inside sem and thus blocking 2 ids.

You would need to generate these ids yourself, so my/sub/sub/sub/dir
starts 5 sems.

Currently sem does not in itself support multiple "--id"s but it could
do that in the future. Maybe using a syntax like:

  find .. | sem --id my/dir/sub --id my/dir --pipe parallel generator
# This does not work yet
  find .. | sem --id "my/dir/sub my/dir" --pipe parallel generator #
This does not work yet
  find .. | sem --id "my/dir/sub,my/dir" --pipe parallel generator #
This does not work yet

Maybe we should even support hierarchical ids, which would fit your
purpose exactly:

  find .. | sem --idhier "my/dir/sub" --pipe parallel generator # This
does not work yet
  find .. | sem --idhier "my/dir" --pipe parallel generator # This
does not work yet
  find .. | sem --idhier "my/other/sub" --pipe parallel generator #
This does not work yet

my/dir would block my/dir/sub
my/dir/sub would block my/dir
my/dir/sub would not block my/dir/other
my/dir would not block my/other/sub

Or generally:

id = --id
for existing in existing_IDs:
  if id start with existing or existing starts with id: block


/Ole

Reply via email to