Hi Minh,
the flock is used in non blocking mode, not sure but as I mentioned
earlier why not add a new function
FileNotify::WaitForFileClose and change the return codes in
ProcessEvents to differentiate the events?
/Thanks HansN
On 05/03/2017 08:43 AM, minh chau wrote:
> Hi Anders, Hans,
>
> Using flock() when reading pid file in osaftransportd, it may be
> causing a deadlock. When open() is called in __create_pidfile(), that
> will generate IN_CREATE towards osaftransportd. Right after that if
> osaftransportd takes the lock before __create_pidfile(), I think it
> will cause a deadlock?
>
> There might be two ways we can do:
>
> (1) It's a little bit change in the temporary approach, we first
> create original pid file as normally, at the end of
> __create_pidfile(), create a symlink temporary pid file targeting the
> original pid file. Then, osaftransportd now watches the temporary pid
> file, and FileNotify can continue using IN_CREATE (which is created by
> both symlink() and open(), don't have to add IN_MOVED_TO)
>
> (2) The other simple one is the retry of reading pid as sent out in
> V1, it does retry as similar as NID is trying to open fifo file after
> call WaitForFileCreation(). We can extend the waiting time of retries
> for osaftransportd as similar to NID, and the pid should be ready to
> read by that period of waiting time.
>
> Or (3)...?
>
> thanks,
> Minh
>
> On 03/05/17 00:43, Anders Widell wrote:
>> Yes, when reading this patch I realize that the pidfile creation
>> already uses flock() for atomicity. Maybe this is standard practice
>> when creating pid files, and if so we probably shouldn't do it in
>> another way because then we can have problems interacting with other
>> programs that operate on the pid files (Ubuntu start-stop-daemon and
>> LSB start_daemon). Maybe the solution is as simple as using flock()
>> when reading the pid file?
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 05/02/2017 03:10 PM, Hans Nordebäck wrote:
>>> Hi Minh,
>>>
>>> some minor comments below, I discussed with Anders about using the
>>> existing flock instead of the rename.
>>>
>>> /BR HansN
>>>
>>>
>>> On 04/28/2017 11:36 AM, Minh Chau wrote:
>>>> At startup, osaftransportd waits for osafdtmd.pid file creation
>>>> and then reads dtm pid. If osafdtmd.pid has not been completedly
>>>> created but osaftransportd still receives IN_CREATE, osaftransported
>>>> will fail to read pid of dtmd. That results in a node reboot with
>>>> a reason as "osafdtmd failed to start".
>>>>
>>>> The patch implements an approach suggested by Anders Widell, which
>>>> creates a completed temporary pid file first, then renames it to
>>>> correct pid file name. Whenever osaftransportd is notified to read
>>>> dtmd's pid, the data in pid file should be always safe to read. In
>>>> addition to this, FileNotify needs to introduce IN_MOVED_TO event.
>>>> ---
>>>> src/base/daemon.c | 27 ++++++++++++++++++---------
>>>> src/base/file_notify.cc | 10 +++++++++-
>>>> 2 files changed, 27 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/src/base/daemon.c b/src/base/daemon.c
>>>> index 77a869561..2fd161ce0 100644
>>>> --- a/src/base/daemon.c
>>>> +++ b/src/base/daemon.c
>>>> @@ -88,40 +88,49 @@ static int __create_pidfile(const char *pidfile)
>>>> {
>>>> FILE *file = NULL;
>>>> int fd, pid, rc = 0;
>>>> + char pidfiletmp[256] = {0};
>>> [HansN] use NAME_MAX instead of 256
>>>> +
>>>> + sprintf(pidfiletmp, "%s.tmp", pidfile);
>>> [HansN] use snprintf and shouldn't pid be included in temp name?
>>>> /* open the file and associate a stream with it */
>>>> - if (((fd = open(pidfile, O_RDWR | O_CREAT, 0644)) == -1) ||
>>>> + if (((fd = open(pidfiletmp, O_RDWR | O_CREAT, 0644)) == -1) ||
>>>> ((file = fdopen(fd, "r+")) == NULL)) {
>>>> - syslog(LOG_ERR, "open failed, pidfile=%s, errno=%s", pidfile,
>>>> - strerror(errno));
>>>> + syslog(LOG_ERR, "open failed, pidfiletmp=%s, errno=%s",
>>>> + pidfiletmp, strerror(errno));
>>> [HansN] close(fd) is missing
>>>> return -1;
>>>> }
>>>> /* Lock the file */
>>>> if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
>>>> - syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s",
>>>> pidfile,
>>>> - strerror(errno));
>>>> + syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>>> + pidfiletmp, strerror(errno));
>>>> fclose(file);
>>>> return -1;
>>>> }
>>>> pid = getpid();
>>>> if (!fprintf(file, "%d\n", pid)) {
>>>> - syslog(LOG_ERR, "fprintf failed, pidfile=%s, errno=%s",
>>>> pidfile,
>>>> - strerror(errno));
>>>> + syslog(LOG_ERR, "fprintf failed, pidfiletmp=%s, errno=%s",
>>>> + pidfiletmp, strerror(errno));
>>>> fclose(file);
>>>> return -1;
>>>> }
>>>> fflush(file);
>>>> if (flock(fd, LOCK_UN) == -1) {
>>>> - syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s",
>>>> pidfile,
>>>> - strerror(errno));
>>>> + syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>>> + pidfiletmp, strerror(errno));
>>>> fclose(file);
>>>> return -1;
>>>> }
>>>> fclose(file);
>>>> + if (rename(pidfiletmp, pidfile) != 0) {
>>>> + syslog(LOG_ERR, "rename failed, old=%s new=%s, error:%s",
>>>> + pidfiletmp, pidfile, strerror(errno));
>>>> + return -1;
>>>> + }
>>>> +
>>>> return rc;
>>>> }
>>>> diff --git a/src/base/file_notify.cc b/src/base/file_notify.cc
>>>> index e96be8b4a..80d1b7d54 100644
>>>> --- a/src/base/file_notify.cc
>>>> +++ b/src/base/file_notify.cc
>>>> @@ -50,7 +50,7 @@ FileNotify::FileNotifyErrors
>>>> FileNotify::WaitForFileCreation(
>>>> SplitFileName(file_name);
>>>> if ((inotify_wd_ = inotify_add_watch(inotify_fd_,
>>>> file_path_.c_str(),
>>>> - IN_CREATE)) == -1) {
>>>> + IN_CREATE | IN_MOVED_TO))
>>>> == -1) {
>>>> LOG_NO("inotify_add_watch failed: %s", strerror(errno));
>>>> return FileNotifyErrors::kError;
>>>> }
>>>> @@ -144,6 +144,14 @@ FileNotify::FileNotifyErrors
>>>> FileNotify::ProcessEvents(
>>>> return FileNotifyErrors::kOK;
>>>> }
>>>> }
>>>> + if (event->mask & IN_MOVED_TO) {
>>>> + if (file_name_ == event->name) {
>>>> + TRACE("file name: %s moved to %s",
>>>> file_name_.c_str(),
>>>> + file_path_.c_str());
>>>> + delete[] fds;
>>>> + return FileNotifyErrors::kOK;
>>>> + }
>>>> + }
>>>> if (event->mask & IN_IGNORED) {
>>>> TRACE("IN_IGNORE received, (ignored)");
>>>> }
>>>
>>
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel