Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )
Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo ...................................................................... Patch Set 3: > Patch Set 3: > > > > Patch Set 3: > > > > > > > Patch Set 2: > > > > > > > > Based on my reading of "man readdir", it's not threadsafe. I > > think the only usage here is if a user hits "http://impalad:.../" > > to see the web UI. It's possible that there's a lock somewhere > > preventing concurrent use, but given that this is already > > reasonably expensive, I'd recommend using the reentrant readdir_r > > instead. > > > > > > > > I looked around, and it looks like C++17 and boost have > > filesystem abstractions, but I think readdir() is simple enough. > > > > > > I looked at the manpage of readdir() here > (http://man7.org/linux/man-pages/man3/readdir.3.html) > > and it claims that "in modern implementations (including the glibc > > implementation), concurrent calls to readdir() > > > that specify different directory streams are thread-safe.". I > > tried this out at > https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d > > and was not able to hit a race (Note that you cannot re-use a dir > > stream). My interpretation is that you cannot use two threads to > > scan a directory in parallel by using the same dir stream (as in > > "the same pointer") because one thread's call will overwrite the > > internal buffer of the other. Creating a fresh dir stream seems > > fine to me though. > > > > Ok, that makes sense. > > I would prefer readdir, if we can assume that it is threadsafe. I grepped > around, and squeasel + kudu/util also uses readdir, though there are some > comments in Kudu that state that it is not threadsafe. > > scandir can be also a possibility ( > man7.org/linux/man-pages/man3/scandir.3.html ). Looking at the definition of __dirstream in the libc source of my dev machine shows that there are no shared datastructures: struct __dirstream { int fd; /* File descriptor. */ __libc_lock_define (, lock) /* Mutex lock for this structure. */ size_t allocation; /* Space allocated for the block. */ size_t size; /* Total valid data in the block. */ size_t offset; /* Current offset into the block. */ off_t filepos; /* Position of next entry to read. */ int errcode; /* Delayed error code. */ /* Directory block. We must make sure that this block starts at an address that is aligned adequately enough to store dirent entries. Using the alignment of "void *" is not sufficient because dirents on 32-bit platforms can require 64-bit alignment. We use "long double" here to be consistent with what malloc uses. */ char data[0] __attribute__ ((aligned (__alignof__ (long double)))); }; In my small test program, each thread also kept its own file descriptor, making me feel confident that this usage of readdir is thread safe. Dan, what's your take? -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Comment-Date: Fri, 17 Nov 2017 03:50:44 +0000 Gerrit-HasComments: No