Hi Anders, Hans,
Using flock() when reading pid file in osaftransportd, it may be causing
a deadlock. When open() is called in __create_pidfile(), that will
generate IN_CREATE towards osaftransportd. Right after that if
osaftransportd takes the lock before __create_pidfile(), I think it will
cause a deadlock?
There might be two ways we can do:
(1) It's a little bit change in the temporary approach, we first create
original pid file as normally, at the end of __create_pidfile(), create
a symlink temporary pid file targeting the original pid file. Then,
osaftransportd now watches the temporary pid file, and FileNotify can
continue using IN_CREATE (which is created by both symlink() and open(),
don't have to add IN_MOVED_TO)
(2) The other simple one is the retry of reading pid as sent out in V1,
it does retry as similar as NID is trying to open fifo file after call
WaitForFileCreation(). We can extend the waiting time of retries for
osaftransportd as similar to NID, and the pid should be ready to read by
that period of waiting time.
Or (3)...?
thanks,
Minh
On 03/05/17 00:43, Anders Widell wrote:
> Yes, when reading this patch I realize that the pidfile creation
> already uses flock() for atomicity. Maybe this is standard practice
> when creating pid files, and if so we probably shouldn't do it in
> another way because then we can have problems interacting with other
> programs that operate on the pid files (Ubuntu start-stop-daemon and
> LSB start_daemon). Maybe the solution is as simple as using flock()
> when reading the pid file?
>
> regards,
>
> Anders Widell
>
>
> On 05/02/2017 03:10 PM, Hans Nordebäck wrote:
>> Hi Minh,
>>
>> some minor comments below, I discussed with Anders about using the
>> existing flock instead of the rename.
>>
>> /BR HansN
>>
>>
>> On 04/28/2017 11:36 AM, Minh Chau wrote:
>>> At startup, osaftransportd waits for osafdtmd.pid file creation
>>> and then reads dtm pid. If osafdtmd.pid has not been completedly
>>> created but osaftransportd still receives IN_CREATE, osaftransported
>>> will fail to read pid of dtmd. That results in a node reboot with
>>> a reason as "osafdtmd failed to start".
>>>
>>> The patch implements an approach suggested by Anders Widell, which
>>> creates a completed temporary pid file first, then renames it to
>>> correct pid file name. Whenever osaftransportd is notified to read
>>> dtmd's pid, the data in pid file should be always safe to read. In
>>> addition to this, FileNotify needs to introduce IN_MOVED_TO event.
>>> ---
>>> src/base/daemon.c | 27 ++++++++++++++++++---------
>>> src/base/file_notify.cc | 10 +++++++++-
>>> 2 files changed, 27 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/src/base/daemon.c b/src/base/daemon.c
>>> index 77a869561..2fd161ce0 100644
>>> --- a/src/base/daemon.c
>>> +++ b/src/base/daemon.c
>>> @@ -88,40 +88,49 @@ static int __create_pidfile(const char *pidfile)
>>> {
>>> FILE *file = NULL;
>>> int fd, pid, rc = 0;
>>> + char pidfiletmp[256] = {0};
>> [HansN] use NAME_MAX instead of 256
>>> +
>>> + sprintf(pidfiletmp, "%s.tmp", pidfile);
>> [HansN] use snprintf and shouldn't pid be included in temp name?
>>> /* open the file and associate a stream with it */
>>> - if (((fd = open(pidfile, O_RDWR | O_CREAT, 0644)) == -1) ||
>>> + if (((fd = open(pidfiletmp, O_RDWR | O_CREAT, 0644)) == -1) ||
>>> ((file = fdopen(fd, "r+")) == NULL)) {
>>> - syslog(LOG_ERR, "open failed, pidfile=%s, errno=%s", pidfile,
>>> - strerror(errno));
>>> + syslog(LOG_ERR, "open failed, pidfiletmp=%s, errno=%s",
>>> + pidfiletmp, strerror(errno));
>> [HansN] close(fd) is missing
>>> return -1;
>>> }
>>> /* Lock the file */
>>> if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
>>> - syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s", pidfile,
>>> - strerror(errno));
>>> + syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>> + pidfiletmp, strerror(errno));
>>> fclose(file);
>>> return -1;
>>> }
>>> pid = getpid();
>>> if (!fprintf(file, "%d\n", pid)) {
>>> - syslog(LOG_ERR, "fprintf failed, pidfile=%s, errno=%s",
>>> pidfile,
>>> - strerror(errno));
>>> + syslog(LOG_ERR, "fprintf failed, pidfiletmp=%s, errno=%s",
>>> + pidfiletmp, strerror(errno));
>>> fclose(file);
>>> return -1;
>>> }
>>> fflush(file);
>>> if (flock(fd, LOCK_UN) == -1) {
>>> - syslog(LOG_ERR, "flock failed, pidfile=%s, errno=%s", pidfile,
>>> - strerror(errno));
>>> + syslog(LOG_ERR, "flock failed, pidfiletmp=%s, errno=%s",
>>> + pidfiletmp, strerror(errno));
>>> fclose(file);
>>> return -1;
>>> }
>>> fclose(file);
>>> + if (rename(pidfiletmp, pidfile) != 0) {
>>> + syslog(LOG_ERR, "rename failed, old=%s new=%s, error:%s",
>>> + pidfiletmp, pidfile, strerror(errno));
>>> + return -1;
>>> + }
>>> +
>>> return rc;
>>> }
>>> diff --git a/src/base/file_notify.cc b/src/base/file_notify.cc
>>> index e96be8b4a..80d1b7d54 100644
>>> --- a/src/base/file_notify.cc
>>> +++ b/src/base/file_notify.cc
>>> @@ -50,7 +50,7 @@ FileNotify::FileNotifyErrors
>>> FileNotify::WaitForFileCreation(
>>> SplitFileName(file_name);
>>> if ((inotify_wd_ = inotify_add_watch(inotify_fd_,
>>> file_path_.c_str(),
>>> - IN_CREATE)) == -1) {
>>> + IN_CREATE | IN_MOVED_TO)) ==
>>> -1) {
>>> LOG_NO("inotify_add_watch failed: %s", strerror(errno));
>>> return FileNotifyErrors::kError;
>>> }
>>> @@ -144,6 +144,14 @@ FileNotify::FileNotifyErrors
>>> FileNotify::ProcessEvents(
>>> return FileNotifyErrors::kOK;
>>> }
>>> }
>>> + if (event->mask & IN_MOVED_TO) {
>>> + if (file_name_ == event->name) {
>>> + TRACE("file name: %s moved to %s", file_name_.c_str(),
>>> + file_path_.c_str());
>>> + delete[] fds;
>>> + return FileNotifyErrors::kOK;
>>> + }
>>> + }
>>> if (event->mask & IN_IGNORED) {
>>> TRACE("IN_IGNORE received, (ignored)");
>>> }
>>
>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel