Dear Parallel Developers, Excuse me, I’m just wondering whether it’s a good idea to expose the lower level APIs for the Parallel software, for example, number of threads to use, highest temperature of the CPU, etc. Many thanks!
[?? ???????] Jack BAI Junior, ECE ZJUI, CompE Major, UIUC Mobile CHN +86 198-8327-0881 USA +1 217-974-4233 Email INTL haob...@intl.zju.edu.cn<mailto:haob...@intl.zju.edu.cn> ILLINI ha...@illinois.edu<mailto:ha...@illinois.edu> HEPTA bai...@hepta.asia<mailto:bai...@hepta.asia> From: Parallel <parallel-bounces+haob2=illinois....@gnu.org> on behalf of parallel-requ...@gnu.org <parallel-requ...@gnu.org> Date: Wednesday, March 9, 2022 at 11:03 AM To: parallel@gnu.org <parallel@gnu.org> Subject: Parallel Digest, Vol 143, Issue 4 Send Parallel mailing list submissions to parallel@gnu.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.com/v3/__https://lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$<https://urldefense.com/v3/__https:/lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$> or, via email, send a message with subject or body 'help' to parallel-requ...@gnu.org You can reach the person managing the list at parallel-ow...@gnu.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Parallel digest..." Today's Topics: 1. Re: processing csv files (Ole Tange) 2. I/O spin loop (Anderson, Stuart B.) 3. Re: I/O spin loop (Ole Tange) 4. Re: bug report (Ole Tange) ---------------------------------------------------------------------- Message: 1 Date: Tue, 8 Mar 2022 18:22:28 +0100 From: Ole Tange <o...@tange.dk> To: Saint Michael <vene...@gmail.com> Cc: parallel <parallel@gnu.org> Subject: Re: processing csv files Message-ID: <ca+4vn7wzib0r0dtdh94qufo+t6lhhms7c1bqk2o-ahddqin...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" On Mon, Mar 7, 2022 at 4:22 AM Saint Michael <vene...@gmail.com> wrote: > > So how would I submit the contents of many files to parallel, without > concatenating them? Why do you see this as a problem? If you are going to start a process for each line of input cat will not slow things down. You _can_ avoid the cat, but it seems a bit silly: < file1.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file2.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file3.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file4.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file5.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file6.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" < file7.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}" And I think you will find the total run time is longer. > The function neds to process each file line by line. > I am sure there must be a better way. > Why concatenate them at all? Because you want to feed them into GNU Parallel as a single input source. cat is way faster than GNU Parallel will ever be, so please explain why you see cat as a problem. seq 10000 > file time cat file >/dev/null < file time parallel echo >/dev/null > There is no relationship between a line and the next line. If you can change function to read from stdin (standard input), then we can do something way more efficient: myfunc() { wc; } export -f myfunc parallel --pipepart --block -1 myfunc :::: *.csv --pipepart has some limitations, but it is insanely fast (almost as fast as a parallelized cat). I > Maybe a new feature? If the previous does not answer your question then it is unclear to me what you really want to do. If you read https://urldefense.com/v3/__https://stackoverflow.com/help/minimal-reproducible-example__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLUjaM04Xg$<https://urldefense.com/v3/__https:/stackoverflow.com/help/minimal-reproducible-example__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLUjaM04Xg$> you will see how to make it easier to help you. /Ole ------------------------------ Message: 2 Date: Tue, 8 Mar 2022 22:02:02 +0000 From: "Anderson, Stuart B." <s...@caltech.edu> To: "parallel@gnu.org" <parallel@gnu.org> Subject: I/O spin loop Message-ID: <2a89e7a0-10ce-42da-9eb9-6f93247b6...@caltech.edu> Content-Type: text/plain; charset="us-ascii" parallel version 20190922 from EPEL 8 running a Rocky Linux 8 system occasionally gets into an I/O spin loop writing 8193 bytes of "x" to a deleted TMPDIR file and then immediately truncating it, e.g., # cat /etc/redhat-release Rocky Linux release 8.5 (Green Obsidian) # yum list parallel Installed Packages parallel.noarch 20190922-1.el8 @epel # strace -p 836133 ... wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0 write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 write(10, "x", 1) = 1 ftruncate(10, 0) = 0 lseek(10, 0, SEEK_SET) = 0 lseek(10, 0, SEEK_CUR) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, {sa_handler=0x7fe81f000dc0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=0x7fe81f000dc0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0 write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 write(10, "x", 1) = 1 ftruncate(10, 0) = 0 lseek(10, 0, SEEK_SET) = 0 lseek(10, 0, SEEK_CUR) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, {sa_handler=0x7fe81f000dc0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, {sa_handler=0x7fe81f000dc0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fe81ed29c20}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 wait4(-1, 0x7ffc539e8a14, WNOHANG, NULL) = 0 write(10, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 8192) = 8192 write(10, "x", 1) = 1 ftruncate(10, 0) = 0 >From lsof fd=10 is, COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME parallel 836133 root 10u REG 0,21 0 8450409 /dev/shm/parTQp67.df (deleted) And /dev/shm is due to my setting TMPDIR=/dev/shm to reduce the I/O load on the /tmp filesystem. Any thoughts on how to avoid this seemingly useless spin loop with high CPU and I/O resource consumption? Thanks. -- Stuart Anderson s...@caltech.edu ------------------------------ Message: 3 Date: Wed, 9 Mar 2022 00:35:39 +0100 From: Ole Tange <o...@tange.dk> To: "Anderson, Stuart B." <s...@caltech.edu> Cc: "parallel@gnu.org" <parallel@gnu.org> Subject: Re: I/O spin loop Message-ID: <CA+4vN7w+gqxUZW0ADwA5tJ=ypys2vyyzm3cmy-mubsumfqg...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" On Tue, Mar 8, 2022 at 11:22 PM Anderson, Stuart B. <s...@caltech.edu> wrote: > > parallel version 20190922 from EPEL 8 running a Rocky Linux 8 system > occasionally gets into an I/O spin loop writing 8193 bytes of "x" to a > deleted TMPDIR file and then immediately truncating it, e.g., This is by design: https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel_design.html*disk-full__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVnbyaThA$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel_design.html*disk-full__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVnbyaThA$> GNU Parallel does an exponential back off: https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel_design.html*exponentially-back-off__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVOQxx-9w$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel_design.html*exponentially-back-off__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVOQxx-9w$> So for long running jobs, this test is run once per second. For short jobs (e.g. parallel true ::: {1..1000}) it will be run much more often. I find it unlikely that the CPU usage is solely caused by the checking of disk full. You can verify this by changing: sub exit_if_disk_full() { return; ... } This will, however, remove the protection that checks if $TMPDIR runs full during a run (and in this case output may be wrong). I ran strace -tt: [pid 1000762] 23:54:08.486547 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 1000775 [pid 1000762] 23:54:08.486782 write(7, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 8192) = 8192 [pid 1000762] 23:54:08.486875 write(7, "b", 1) = 1 [pid 1000762] 23:54:08.486956 ftruncate(7, 0) = 0 so the write+truncate+seek takes at most 406 microseconds. When run on my system `vmstat 1` shows that the write never reaches the disk, which is the goal. > Any thoughts on how to avoid this seemingly useless spin loop with high CPU > and I/O resource consumption? Can you show that there really is an I/O consumption? Do the writes actually reach your disk? /Ole ------------------------------ Message: 4 Date: Wed, 9 Mar 2022 01:07:13 +0100 From: Ole Tange <o...@tange.dk> To: Julien Gamba <jul...@jgamba.eu> Cc: parallel <parallel@gnu.org> Subject: Re: bug report Message-ID: <ca+4vn7x+dewa8+8tkw2el+2rqjxp4kt3+8hek2qnha+u7j-...@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" On Mon, Mar 7, 2022 at 3:10 PM Julien Gamba <jul...@jgamba.eu> wrote: > Here is the message I get: > * The version number: 20161222 > * The bugid: swap_activity_file-r : > I hope there is enough information here! Unfortunately this is not enough to see what is going on. > Happy to give more details if needed. See: https://urldefense.com/v3/__https://www.gnu.org/software/parallel/parallel.html*reporting-bugs__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLW5alWvAQ$<https://urldefense.com/v3/__https:/www.gnu.org/software/parallel/parallel.html*reporting-bugs__;Iw!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLW5alWvAQ$> > Thanks, and thanks for your work on this awesome tool! You are welcome. /Ole ------------------------------ Subject: Digest Footer _______________________________________________ Parallel mailing list Parallel@gnu.org https://urldefense.com/v3/__https://lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$<https://urldefense.com/v3/__https:/lists.gnu.org/mailman/listinfo/parallel__;!!DZ3fjg!qQboNimLjlB8RxAUwzOIxYqMP9NixR2R5F8jam52ZJu93h95C-O0L3nlPLVY33EvFQ$> ------------------------------ End of Parallel Digest, Vol 143, Issue 4 ****************************************