Dear Parallel Developers,
Excuse me, I’m just wondering whether it’s a good idea to expose the lower 
level APIs for the Parallel software, for example, number of threads to use, 
highest temperature of the CPU, etc.
Many thanks!



[??  ???????]
Jack BAI
Junior, ECE ZJUI, CompE Major, UIUC
Mobile
CHN +86 198-8327-0881
USA +1 217-974-4233
Email
INTL haob...@intl.zju.edu.cn<mailto:haob...@intl.zju.edu.cn>
ILLINI ha...@illinois.edu<mailto:ha...@illinois.edu>
HEPTA bai...@hepta.asia<mailto:bai...@hepta.asia>



From: Parallel <parallel-bounces+haob2=illinois....@gnu.org> on behalf of 
parallel-requ...@gnu.org <parallel-requ...@gnu.org>
Date: Wednesday, March 9, 2022 at 11:03 AM
To: parallel@gnu.org <parallel@gnu.org>
Subject: Parallel Digest, Vol 143, Issue 4
Send Parallel mailing list submissions to
        parallel@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
        
https://urldefense.com/v3/__https://lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$<https://urldefense.com/v3/__https:/lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$>
or, via email, send a message with subject or body 'help' to
        parallel-requ...@gnu.org

You can reach the person managing the list at
        parallel-ow...@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Parallel digest..."


Today's Topics:

   1. Re: processing csv files (Ole Tange)
   2. I/O spin loop (Anderson, Stuart B.)
   3. Re: I/O spin loop (Ole Tange)
   4. Re: bug report (Ole Tange)


----------------------------------------------------------------------

Message: 1
Date: Tue, 8 Mar 2022 18:22:28 +0100
From: Ole Tange <o...@tange.dk>
To: Saint Michael <vene...@gmail.com>
Cc: parallel <parallel@gnu.org>
Subject: Re: processing csv files
Message-ID:
        <ca+4vn7wzib0r0dtdh94qufo+t6lhhms7c1bqk2o-ahddqin...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

On Mon, Mar 7, 2022 at 4:22 AM Saint Michael <vene...@gmail.com> wrote:
>
> So how would I submit the contents of many files to parallel, without 
> concatenating them?

Why do you see this as a problem? If you are going to start a process
for each line of input cat will not slow things down.

You _can_ avoid the cat, but it seems a bit silly:

< file1.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file2.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file3.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file4.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file5.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file6.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"
< file7.csv parallel --colsep ',' function  "{1} {2} {3} {4} {5} {6} {7}"

And I think you will find the total run time is longer.

> The function neds to process each file line by line.
> I am sure there must be a better way.

> Why concatenate them at all?

Because you want to feed them into GNU Parallel as a single input source.

cat is way faster than GNU Parallel will ever be, so please explain
why you see cat as a problem.

seq 10000 > file
time cat file >/dev/null
< file time parallel echo >/dev/null

> There is no relationship between a line and the next line.

If you can change function to read from stdin (standard input), then
we can do something way more efficient:

myfunc() { wc; }
export -f myfunc
parallel --pipepart --block -1 myfunc :::: *.csv

--pipepart has some limitations, but it is insanely fast (almost as
fast as a parallelized cat). I

> Maybe a new feature?

If the previous does not answer your question then it is unclear to me
what you really want to do.

If you read 
https://urldefense.com/v3/__https://stackoverflow.com/help/minimal-reproducible-example__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLUjaM04Xg$<https://urldefense.com/v3/__https:/stackoverflow.com/help/minimal-reproducible-example__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLUjaM04Xg$>
you will see how to make it easier to help you.


/Ole



------------------------------

Message: 2
Date: Tue, 8 Mar 2022 22:02:02 +0000
From: "Anderson, Stuart B." <s...@caltech.edu>
To: "parallel@gnu.org" <parallel@gnu.org>
Subject: I/O spin loop
Message-ID: <2a89e7a0-10ce-42da-9eb9-6f93247b6...@caltech.edu>
Content-Type: text/plain; charset="us-ascii"

parallel version 20190922 from EPEL 8 running a Rocky Linux 8 system 
occasionally gets into an I/O spin loop writing 8193 bytes of "x" to a deleted 
TMPDIR file and then immediately truncating it, e.g.,

# cat /etc/redhat-release
Rocky Linux release 8.5 (Green Obsidian)

# yum list parallel
Installed Packages
parallel.noarch                     20190922-1.el8                      @epel

# strace -p 836133
...
wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0
write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192
write(10, "x", 1)                       = 1
ftruncate(10, 0)                        = 0
lseek(10, 0, SEEK_SET)                  = 0
lseek(10, 0, SEEK_CUR)                  = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0x7fe81f000dc0, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=SIG_DFL, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, 
sa_restorer=0x7fe81ed29c20}, {sa_handler=0x7fe81f000dc0, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0
write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192
write(10, "x", 1)                       = 1
ftruncate(10, 0)                        = 0
lseek(10, 0, SEEK_SET)                  = 0
lseek(10, 0, SEEK_CUR)                  = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=0x7fe81f000dc0, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=SIG_DFL, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, 
sa_restorer=0x7fe81ed29c20}, {sa_handler=0x7fe81f000dc0, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0
write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192
write(10, "x", 1)                       = 1
ftruncate(10, 0)                        = 0

>From lsof fd=10 is,

COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
parallel 836133 root   10u   REG   0,21        0  8450409 /dev/shm/parTQp67.df 
(deleted)

And /dev/shm is due to my setting TMPDIR=/dev/shm to reduce the I/O load on the 
/tmp filesystem.

Any thoughts on how to avoid this seemingly useless spin loop with high CPU and 
I/O resource consumption?

Thanks.


--
Stuart Anderson
s...@caltech.edu






------------------------------

Message: 3
Date: Wed, 9 Mar 2022 00:35:39 +0100
From: Ole Tange <o...@tange.dk>
To: "Anderson, Stuart B." <s...@caltech.edu>
Cc: "parallel@gnu.org" <parallel@gnu.org>
Subject: Re: I/O spin loop
Message-ID:
        <CA+4vN7w+gqxUZW0ADwA5tJ=ypys2vyyzm3cmy-mubsumfqg...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

On Tue, Mar 8, 2022 at 11:22 PM Anderson, Stuart B. <s...@caltech.edu> wrote:
>
> parallel version 20190922 from EPEL 8 running a Rocky Linux 8 system 
> occasionally gets into an I/O spin loop writing 8193 bytes of "x" to a 
> deleted TMPDIR file and then immediately truncating it, e.g.,

This is by design:

https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel_design.html*disk-full__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVnbyaThA$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel_design.html*disk-full__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVnbyaThA$>

GNU Parallel does an exponential back off:

https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel_design.html*exponentially-back-off__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVOQxx-9w$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel_design.html*exponentially-back-off__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVOQxx-9w$>

So for long running jobs, this test is run once per second. For short
jobs (e.g. parallel true ::: {1..1000}) it will be run much more
often.

I find it unlikely that the CPU usage is solely caused by the checking
of disk full. You can verify this by changing:

    sub exit_if_disk_full() {
        return;
        ...
   }

This will, however, remove the protection that checks if $TMPDIR runs
full during a run (and in this case output may be wrong).

I ran strace -tt:

[pid 1000762] 23:54:08.486547 wait4(-1, [{WIFEXITED(s) &&
WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 1000775
[pid 1000762] 23:54:08.486782 write(7,
"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 8192) = 8192
[pid 1000762] 23:54:08.486875 write(7, "b", 1) = 1
[pid 1000762] 23:54:08.486956 ftruncate(7, 0) = 0

so the write+truncate+seek takes at most 406 microseconds. When run on
my system `vmstat 1` shows that the write never reaches the disk,
which is the goal.

> Any thoughts on how to avoid this seemingly useless spin loop with high CPU 
> and I/O resource consumption?

Can you show that there really is an I/O consumption? Do the writes
actually reach your disk?


/Ole



------------------------------

Message: 4
Date: Wed, 9 Mar 2022 01:07:13 +0100
From: Ole Tange <o...@tange.dk>
To: Julien Gamba <jul...@jgamba.eu>
Cc: parallel <parallel@gnu.org>
Subject: Re: bug report
Message-ID:
        <ca+4vn7x+dewa8+8tkw2el+2rqjxp4kt3+8hek2qnha+u7j-...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

On Mon, Mar 7, 2022 at 3:10 PM Julien Gamba <jul...@jgamba.eu> wrote:

> Here is the message I get:
> * The version number: 20161222
> * The bugid: swap_activity_file-r
:
> I hope there is enough information here!

Unfortunately this is not enough to see what is going on.

> Happy to give more details if needed.

See: 
https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel.html*reporting-bugs__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLW5alWvAQ$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel.html*reporting-bugs__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLW5alWvAQ$>

> Thanks, and thanks for your work on this awesome tool!

You are welcome.


/Ole



------------------------------

Subject: Digest Footer

_______________________________________________
Parallel mailing list
Parallel@gnu.org
https://urldefense.com/v3/__https://lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$<https://urldefense.com/v3/__https:/lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$>


------------------------------

End of Parallel Digest, Vol 143, Issue 4
****************************************

Reply via email to