Re: rcs configure hang
On 11/5/20 2:28 PM, Kelly Wang (kellythw) wrote: When strace hang, I do 'ps -elf | grep strace' from other terminal and do kill -9 kill -s INT $(ps -o pid= -C a.out) looks like not working from my server. Assuming you're using the Linux kernel signal numbers, you should be able to get a process ID (say, 4729) and use this: kill -2 4729 instead of the fancier 'kill' command I suggested. Also, try this in another session: kill -14 4729 which sends the ALRM signal instead of the INT signal. Either way, see what 'tr' says.
Re: rcs configure hang
Hi Paul, When strace hang, I do 'ps -elf | grep strace' from other terminal and do kill -9 kill -s INT $(ps -o pid= -C a.out) looks like not working from my server. % kill -s INT $(ps -o pid= -C a.out) Illegal variable name. Thanks, Kelly If you need support for DevX Tools: http://devxsupport.cisco.com/ Specifically, for NXOS, see - https://wiki.cisco.com/display/NEXUSPMO/ContactingNexusOpsAndTools On 11/5/20, 1:36 PM, "Paul Eggert" wrote: On 11/5/20 1:18 PM, Kelly Wang (kellythw) wrote: > With the conftest.c you provided, strace still hang. > Check for how many calls for chdir("confdir3"), it only has 110 times, then hang after mkdir("confdir3", 0700 ... > Is there any directory limitation that can make on a server? Wow, that's a more-serious kernel (or filesystem) bug than I thought: the mkdir system call is hanging and does not appear to be interruptible via SIGALRM. When the program hangs, how do you terminate it? Do you use Control-C from a terminal? If so, what happens if you instead use 'kill'? Something like this: rm -fr conftest3 gcc conftest.c strace -o tr ./a.out & sleep 1 kill -s INT $(ps -o pid= -C a.out) That last line should send the SIGINT signal to the a.out command; does this cause a.out to exit? (You can look at 'tr' to see.) If it exits, perhaps we can modify conftest3 to do the same thing to itself when it is running on a buggy kernel. Also, what happens if you do the same recipe as above, but use 'ALRM' rather than 'INT'? Again, look at the end of 'tr'.
Re: rcs configure hang
On 11/5/20 1:18 PM, Kelly Wang (kellythw) wrote: With the conftest.c you provided, strace still hang. Check for how many calls for chdir("confdir3"), it only has 110 times, then hang after mkdir("confdir3", 0700 ... Is there any directory limitation that can make on a server? Wow, that's a more-serious kernel (or filesystem) bug than I thought: the mkdir system call is hanging and does not appear to be interruptible via SIGALRM. When the program hangs, how do you terminate it? Do you use Control-C from a terminal? If so, what happens if you instead use 'kill'? Something like this: rm -fr conftest3 gcc conftest.c strace -o tr ./a.out & sleep 1 kill -s INT $(ps -o pid= -C a.out) That last line should send the SIGINT signal to the a.out command; does this cause a.out to exit? (You can look at 'tr' to see.) If it exits, perhaps we can modify conftest3 to do the same thing to itself when it is running on a buggy kernel. Also, what happens if you do the same recipe as above, but use 'ALRM' rather than 'INT'? Again, look at the end of 'tr'.
Re: rcs configure hang
Hi Paul, With the conftest.c you provided, strace still hang. Check for how many calls for chdir("confdir3"), it only has 110 times, then hang after mkdir("confdir3", 0700 ... Is there any directory limitation that can make on a server? sjc-ads-7913:/ws/kellythw-sjc/rcs_try/getcwd-test% tail tr chdir("confdir3") = 0 mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 mkdir("confdir3", 0700 % grep 'chdir("confdir3")' tr | wc -l 110 Thanks, Kelly If you need support for DevX Tools: http://devxsupport.cisco.com/ Specifically, for NXOS, see - https://wiki.cisco.com/display/NEXUSPMO/ContactingNexusOpsAndTools On 11/5/20, 9:57 AM, "Paul Eggert" wrote: On 10/27/20 8:36 AM, Kelly Wang (kellythw) wrote: > You are right, after remove confdir3, rerun strace hang. > Checked tr output, it stopped at bunch of mkdir and chdir and no further steps after that. > mkdir("confdir3", 0700) = 0 > chdir("confdir3") = 0 How many chdir("confdir3") calls were there, exactly? On my platform there were 1367. My guess is that the getcwd system call hung on your platform, which suggests a kernel or filesystem bug somewhere. What happens if you run the attached conftest.c instead? It's the same as before, except with an 'alarm (10)' call. As before, run it like this in your development directory: rm -fr conftest3 gcc conftest.c strace -o tr ./a.out and see how 'tr' ends if it hangs (which I hope it doesn't).
Re: rcs configure hang
On 10/27/20 8:36 AM, Kelly Wang (kellythw) wrote: You are right, after remove confdir3, rerun strace hang. Checked tr output, it stopped at bunch of mkdir and chdir and no further steps after that. mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 How many chdir("confdir3") calls were there, exactly? On my platform there were 1367. My guess is that the getcwd system call hung on your platform, which suggests a kernel or filesystem bug somewhere. What happens if you run the attached conftest.c instead? It's the same as before, except with an 'alarm (10)' call. As before, run it like this in your development directory: rm -fr conftest3 gcc conftest.c strace -o tr ./a.out and see how 'tr' ends if it hangs (which I hope it doesn't). /* confdefs.h */ #define PACKAGE_NAME "dummy" #define PACKAGE_TARNAME "dummy" #define PACKAGE_VERSION "0" #define PACKAGE_STRING "dummy 0" #define PACKAGE_BUGREPORT "" #define PACKAGE_URL "" #define PACKAGE "dummy" #define VERSION "0" #define STDC_HEADERS 1 #define HAVE_SYS_TYPES_H 1 #define HAVE_SYS_STAT_H 1 #define HAVE_STDLIB_H 1 #define HAVE_STRING_H 1 #define HAVE_MEMORY_H 1 #define HAVE_STRINGS_H 1 #define HAVE_INTTYPES_H 1 #define HAVE_STDINT_H 1 #define HAVE_UNISTD_H 1 #define __EXTENSIONS__ 1 #define _ALL_SOURCE 1 #define _DARWIN_C_SOURCE 1 #define _GNU_SOURCE 1 #define _NETBSD_SOURCE 1 #define _OPENBSD_SOURCE 1 #define _POSIX_PTHREAD_SEMANTICS 1 #define __STDC_WANT_IEC_60559_ATTRIBS_EXT__ 1 #define __STDC_WANT_IEC_60559_BFP_EXT__ 1 #define __STDC_WANT_IEC_60559_DFP_EXT__ 1 #define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1 #define __STDC_WANT_IEC_60559_TYPES_EXT__ 1 #define __STDC_WANT_LIB_EXT2__ 1 #define __STDC_WANT_MATH_SPEC_FUNCS__ 1 #define _TANDEM_SOURCE 1 #define _HPUX_ALT_XOPEN_SOCKET_API 1 #define HAVE_SYS_SOCKET_H 1 #define HAVE_ARPA_INET_H 1 #define HAVE_FEATURES_H 1 #define HAVE_UNISTD_H 1 #define HAVE_SYS_PARAM_H 1 #define HAVE_DIRENT_H 1 #define HAVE_SYS_STAT_H 1 #define HAVE_SYS_TIME_H 1 #define HAVE_NETDB_H 1 #define HAVE_NETINET_IN_H 1 #define HAVE_LIMITS_H 1 #define HAVE_WCHAR_H 1 #define HAVE_STDINT_H 1 #define HAVE_INTTYPES_H 1 #define HAVE_THREADS_H 1 #define HAVE_SYS_MMAN_H 1 #define HAVE_SYS_SELECT_H 1 #define HAVE_PTHREAD_H 1 #define HAVE_SYS_CDEFS_H 1 #define HAVE_SYS_IOCTL_H 1 #define HAVE_SYS_UIO_H 1 #define restrict __restrict #define HAVE_SHUTDOWN 1 #define HAVE_STRUCT_SOCKADDR_STORAGE 1 #define HAVE_SA_FAMILY_T 1 #define HAVE_STRUCT_SOCKADDR_STORAGE_SS_FAMILY 1 #define HAVE_ALLOCA_H 1 #define HAVE_ALLOCA 1 #define HAVE_FCHDIR 1 #define HAVE_FCNTL 1 #define HAVE_SYMLINK 1 #define HAVE_FDOPENDIR 1 #define HAVE_MEMPCPY 1 #define HAVE_FSTATAT 1 #define HAVE_FTRUNCATE 1 #define HAVE_GETDTABLESIZE 1 #define HAVE_GETTIMEOFDAY 1 #define HAVE_ISBLANK 1 #define HAVE_LSTAT 1 #define HAVE_MPROTECT 1 #define HAVE_OPENAT 1 #define HAVE_STRERROR_R 1 #define HAVE___XPG_STRERROR_R 1 #define HAVE_PIPE 1 #define HAVE_SIGACTION 1 #define HAVE_SIGALTSTACK 1 #define HAVE_SIGINTERRUPT 1 #define HAVE_SLEEP 1 #define HAVE_CATGETS 1 #define HAVE_SNPRINTF 1 #define HAVE_USLEEP 1 #define HAVE_ENVIRON_DECL 1 #define HAVE_DECL_STRERROR_R 1 #define HAVE_STRERROR_R 1 #define STRERROR_R_CHAR_P 1 #define HAVE_DECL_FCHDIR 1 #define HAVE_WORKING_O_NOATIME 1 #define HAVE_WORKING_O_NOFOLLOW 1 #define LSTAT_FOLLOWS_SLASHED_SYMLINK 1 #define HAVE_DECL_GETCWD 1 #define HAVE_DECL_GETDTABLESIZE 1 #define HAVE_IPV4 1 #define HAVE_IPV6 1 #define HAVE_WINT_T 1 #define HAVE_LONG_LONG_INT 1 #define HAVE_UNSIGNED_LONG_LONG_INT 1 #define HAVE_WEAK_SYMBOLS 1 #define HAVE_PTHREAD_API 1 #define USE_POSIX_THREADS 1 #define USE_POSIX_THREADS_WEAK 1 #define MALLOC_0_IS_NONNULL 1 #define HAVE_MAP_ANONYMOUS 1 #define HAVE_DECL_MEMRCHR 1 #define HAVE_DECL_ALARM 1 #define PROMOTED_MODE_T mode_t #define HAVE_DECL_STRERROR_R 1 #define HAVE_SIGSET_T 1 #define HAVE__BOOL 1 #define HAVE_WCHAR_T 1 #define HAVE_DECL_STRDUP 1 #define _USE_STD_STAT 1 #define HAVE_DECL_UNSETENV 1 #define GNULIB_TEST_ACCEPT 1 #define HAVE_ALLOCA 1 #define GNULIB_TEST_BIND 1 #define GNULIB_TEST_CHDIR 1 #define GNULIB_TEST_CLOEXEC 1 #define GNULIB_TEST_CLOSE 1 #define HAVE_CLOSEDIR 1 #define GNULIB_TEST_CLOSEDIR 1 #define GNULIB_TEST_CONNECT 1 #define D_INO_IN_DIRENT 1 #define HAVE_DIRFD 1 #define HAVE_DECL_DIRFD 1 #define GNULIB_TEST_DIRFD 1 #define GNULIB_TEST_DUP 1 #define GNULIB_TEST_DUP2 1 #define GNULIB_TEST_ENVIRON 1 #define GNULIB_TEST_FCHDIR 1 #define GNULIB_TEST_FCNTL 1 #define GNULIB_FD_SAFER_FLAG 1 #define GNULIB_TEST_FDOPEN 1 #define HAVE_DECL_FDOPENDIR 1 #define GNULIB_TEST_FDOPENDIR 1 #define GNULIB_FDOPENDIR 1 #define GNULIB_TEST_FSTAT 1 #define GNULIB_TEST_FSTATAT 1 #define GNULIB_TEST_FTRUNCATE 1 /* end confdefs.h. */ #include #include #if HAVE_UNISTD_H # include #else # include #endif #include #include #include #include #include /* Arrange to define PATH_MAX, like "pathmax.h" does. */ #if HAVE_UNISTD_H # include #endif #include #if defined
Re: rcs configure hang
Hi Paul or gnulib guru, Can you share any thought for the configure hanging problem while configure rcs? + ./configure checking whether fcntl handles F_DUPFD correctly... yes checking whether fcntl understands F_DUPFD_CLOEXEC... needs runtime check checking whether conversion from 'int' to 'long double' works... yes checking whether getcwd handles long file names properly... Thanks, Kelly If you need support for DevX Tools: http://devxsupport.cisco.com/ Specifically, for NXOS, see - https://wiki.cisco.com/display/NEXUSPMO/ContactingNexusOpsAndTools On 10/27/20, 8:36 AM, "Kelly Wang (kellythw)" wrote: Hi Paul, You are right, after remove confdir3, rerun strace hang. Checked tr output, it stopped at bunch of mkdir and chdir and no further steps after that. mkdir("confdir3", 0700) = 0 chdir("confdir3") = 0 Thanks, Kelly If you need support for DevX Tools: http://devxsupport.cisco.com/ Specifically, for NXOS, see - https://wiki.cisco.com/display/NEXUSPMO/ContactingNexusOpsAndTools On 10/26/20, 3:56 PM, "Paul Eggert" wrote: On 10/26/20 9:13 AM, Kelly Wang (kellythw) wrote: > [Kelly] strace step is not hang and I have tr generated. Looking at the tr file, it appears that there was already a directory confdir3 when you ran the strace step, and this directory messed up the test. Please remove that directory (or rename it) and then re-run the "strace -o tr ./a.out". As before, the strace should also hang so you may need to type control-C to exit it after a while. Look at the resulting 'tr' file and compare it to the compressed file tr.gz I sent you earlier. > [Kelly] The difference of tr output start at: > > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 ==> output from yours > > openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 ==> my output That difference is unimportant. I'm concerned more about what happens after the long string of mkdir/chdir calls, which should occur once you get confdir3 out of the way.