Hi Minh,

the flock is used in non blocking mode, not sure but as I mentioned 
earlier why not add a new function

FileNotify::WaitForFileClose and change the return codes in 
ProcessEvents to differentiate the events?

/Thanks HansN


On 05/03/2017 08:43 AM, minh chau wrote:
> Hi Anders, Hans,
>
> Using flock() when reading pid file in osaftransportd, it may be 
> causing a deadlock. When open() is called in __create_pidfile(), that 
> will generate IN_CREATE towards osaftransportd. Right after that if 
> osaftransportd takes the lock before __create_pidfile(), I think it 
> will cause a deadlock?
>
> There might be two ways we can do:
>
> (1) It's a little bit change in the temporary approach, we first 
> create original pid file as normally, at the end of 
> __create_pidfile(), create a symlink temporary pid file targeting the 
> original pid file. Then, osaftransportd now watches the temporary pid 
> file, and FileNotify can continue using IN_CREATE (which is created by 
> both symlink() and open(), don't have to add IN_MOVED_TO)
>
> (2) The other simple one is the retry of reading pid as sent out in 
> V1, it does retry as similar as NID is trying to open fifo file after 
> call WaitForFileCreation(). We can extend the waiting time of retries 
> for osaftransportd as similar to NID, and the pid should be ready to 
> read by that period of waiting time.
>
> Or (3)...?
>
> thanks,
> Minh
>
> On 03/05/17 00:43, Anders Widell wrote:
>> Yes, when reading this patch I realize that the pidfile creation 
>> already uses flock() for atomicity. Maybe this is standard practice 
>> when creating pid files, and if so we probably shouldn't do it in 
>> another way because then we can have problems interacting with other 
>> programs that operate on the pid files (Ubuntu start-stop-daemon and 
>> LSB start_daemon). Maybe the solution is as simple as using flock() 
>> when reading the pid file?
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 05/02/2017 03:10 PM, Hans Nordebäck wrote:
>>> Hi Minh,
>>>
>>> some minor comments below, I discussed with Anders about using the 
>>> existing flock instead of the rename.
>>>
>>> /BR HansN
>>>
>>>
>>> On 04/28/2017 11:36 AM, Minh Chau wrote:
>>>> At startup, osaftransportd waits for osafdtmd.pid file creation
>>>> and then reads dtm pid. If osafdtmd.pid has not been completedly
>>>> created but osaftransportd still receives IN_CREATE, osaftransported
>>>> will fail to read pid of dtmd. That results in a node reboot with
>>>> a reason as "osafdtmd failed to start".
>>>>
>>>> The patch implements an approach suggested by Anders Widell, which
>>>> creates a completed temporary pid file first, then renames it to
>>>> correct pid file name. Whenever osaftransportd is notified to read
>>>> dtmd's pid, the data in pid file should be always safe to read. In
>>>> addition to this, FileNotify needs to introduce IN_MOVED_TO event.
>>>> ---
>>>>   src/base/daemon.c       | 27 ++++++++++++++++++---------
>>>>   src/base/file_notify.cc | 10 +++++++++-
>>>>   2 files changed, 27 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/src/base/daemon.c b/src/base/daemon.c
>>>> index 77a869561..2fd161ce0 100644
>>>> --- a/src/base/daemon.c
>>>> +++ b/src/base/daemon.c
>>>> @@ -88,40 +88,49 @@ static int __create_pidfile(const char *pidfile)
>>>>   {
>>>>       FILE *file = NULL;
>>>>       int fd, pid, rc = 0;
>>>> +    char pidfiletmp[256] = {0};
>>> [HansN] use NAME_MAX instead of 256
>>>> +
>>>> +    sprintf(pidfiletmp, "%s.tmp", pidfile);
>>> [HansN] use snprintf and shouldn't pid be included in temp name?
>>>>         /* open the file and associate a stream with it */
>>>> -    if (((fd = open(pidfile, O_RDWR | O_CREAT, 0644)) == -1) ||
>>>> +    if (((fd = open(pidfiletmp, O_RDWR | O_CREAT, 0644)) == -1) ||
>>>>           ((file = fdopen(fd, "r+")) == NULL)) {
>>>> -        syslog(LOG_ERR, "open failed, pidfile=%s, errno=%s", pidfile,
>>>> -               strerror(errno));
>>>> +        syslog(LOG_ERR, "open failed, pidfiletmp=%s, errno=%s",
>>>> +            pidfiletmp, strerror(errno));
>>> [HansN] close(fd) is missing
>>>>           return -1;
>>>>       }
>>>>         /* Lock the file */
>>>>       if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
>>>> -        syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s", 
>>>> pidfile,
>>>> -               strerror(errno));
>>>> +        syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>>> +            pidfiletmp, strerror(errno));
>>>>           fclose(file);
>>>>           return -1;
>>>>       }
>>>>         pid = getpid();
>>>>       if (!fprintf(file, "%d\n", pid)) {
>>>> -        syslog(LOG_ERR, "fprintf failed, pidfile=%s, errno=%s", 
>>>> pidfile,
>>>> -               strerror(errno));
>>>> +        syslog(LOG_ERR, "fprintf failed, pidfiletmp=%s, errno=%s",
>>>> +             pidfiletmp, strerror(errno));
>>>>           fclose(file);
>>>>           return -1;
>>>>       }
>>>>       fflush(file);
>>>>         if (flock(fd, LOCK_UN) == -1) {
>>>> -        syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s", 
>>>> pidfile,
>>>> -               strerror(errno));
>>>> +        syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>>> +            pidfiletmp, strerror(errno));
>>>>           fclose(file);
>>>>           return -1;
>>>>       }
>>>>       fclose(file);
>>>>   +    if (rename(pidfiletmp, pidfile) != 0) {
>>>> +        syslog(LOG_ERR, "rename failed, old=%s new=%s, error:%s",
>>>> +            pidfiletmp, pidfile, strerror(errno));
>>>> +        return -1;
>>>> +    }
>>>> +
>>>>       return rc;
>>>>   }
>>>>   diff --git a/src/base/file_notify.cc b/src/base/file_notify.cc
>>>> index e96be8b4a..80d1b7d54 100644
>>>> --- a/src/base/file_notify.cc
>>>> +++ b/src/base/file_notify.cc
>>>> @@ -50,7 +50,7 @@ FileNotify::FileNotifyErrors 
>>>> FileNotify::WaitForFileCreation(
>>>>     SplitFileName(file_name);
>>>>       if ((inotify_wd_ = inotify_add_watch(inotify_fd_, 
>>>> file_path_.c_str(),
>>>> -                                       IN_CREATE)) == -1) {
>>>> +                                       IN_CREATE | IN_MOVED_TO)) 
>>>> == -1) {
>>>>       LOG_NO("inotify_add_watch failed: %s", strerror(errno));
>>>>       return FileNotifyErrors::kError;
>>>>     }
>>>> @@ -144,6 +144,14 @@ FileNotify::FileNotifyErrors 
>>>> FileNotify::ProcessEvents(
>>>>                   return FileNotifyErrors::kOK;
>>>>                 }
>>>>               }
>>>> +            if (event->mask & IN_MOVED_TO) {
>>>> +              if (file_name_ == event->name) {
>>>> +                TRACE("file name: %s moved to %s", 
>>>> file_name_.c_str(),
>>>> +                    file_path_.c_str());
>>>> +                delete[] fds;
>>>> +                return FileNotifyErrors::kOK;
>>>> +              }
>>>> +            }
>>>>               if (event->mask & IN_IGNORED) {
>>>>                 TRACE("IN_IGNORE received, (ignored)");
>>>>               }
>>>
>>
>>
>>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to