Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 3:
>
> > > Patch Set 3:
>  > >
>  > > > Patch Set 2:
>  > > >
>  > > > Based on my reading of "man readdir", it's not threadsafe. I
>  > think the only usage here is if a user hits "http://impalad:.../";
>  > to see the web UI. It's possible that there's a lock somewhere
>  > preventing concurrent use, but given that this is already
>  > reasonably expensive, I'd recommend using the reentrant readdir_r
>  > instead.
>  > > >
>  > > > I looked around, and it looks like C++17 and boost have
>  > filesystem abstractions, but I think readdir() is simple enough.
>  > >
>  > > I looked at the manpage of readdir() here 
> (http://man7.org/linux/man-pages/man3/readdir.3.html)
>  > and it claims that "in modern implementations (including the glibc
>  > implementation), concurrent calls to readdir()
>  > > that specify different directory streams are thread-safe.". I
>  > tried this out at 
> https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d
>  > and was not able to hit a race (Note that you cannot re-use a dir
>  > stream). My interpretation is that you cannot use two threads to
>  > scan a directory in parallel by using the same dir stream (as in
>  > "the same pointer") because one thread's call will overwrite the
>  > internal buffer of the other. Creating a fresh dir stream seems
>  > fine to me though.
>  >
>  > Ok, that makes sense.
>
> I would prefer readdir, if we can assume that it is threadsafe. I grepped 
> around, and squeasel + kudu/util also uses readdir, though there are some 
> comments in Kudu that state that it is not threadsafe.
> 
> scandir can be also a possibility ( 
> man7.org/linux/man-pages/man3/scandir.3.html ).

Looking at the definition of __dirstream in the libc source of my dev machine 
shows that there are no shared datastructures:

struct __dirstream
  {
    int fd;         /* File descriptor.  */

    __libc_lock_define (, lock) /* Mutex lock for this structure.  */

    size_t allocation;      /* Space allocated for the block.  */
    size_t size;        /* Total valid data in the block.  */
    size_t offset;      /* Current offset into the block.  */

    off_t filepos;      /* Position of next entry to read.  */

    int errcode;        /* Delayed error code.  */

    /* Directory block.  We must make sure that this block starts
       at an address that is aligned adequately enough to store
       dirent entries.  Using the alignment of "void *" is not
       sufficient because dirents on 32-bit platforms can require
       64-bit alignment.  We use "long double" here to be consistent
       with what malloc uses.  */
    char data[0] __attribute__ ((aligned (__alignof__ (long double))));
  };

In my small test program, each thread also kept its own file descriptor, making 
me feel confident that this usage of readdir is thread safe.

Dan, what's your take?


--
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 03:50:44 +0000
Gerrit-HasComments: No

Reply via email to