Woundn't it be enough to wait for IN_CLOSE_WRITE only? Maybe we can 
introduce a new method in FileNotify, instead of extending 
WaitForFileCreation()?

/ Anders Widell


On 04/21/2017 03:49 PM, minh chau wrote:
> Hi Anders,
>
> It is because WaitForFileCreation() is also called in nid, where nid 
> does not really need to know IN_CLOSE.
> To introduce IN_CLOSE in file_notify.cc, then we have to change nid to 
> only wait for IN_CREATE, and transportd waits for IN_CREATE | IN_CLOSE.
> If this sounds ok to you, I can make a V2 for the above change?
>
> thanks,
> Minh
>
> On 21/04/17 23:27, Anders Widell wrote:
>> Hi!
>>
>> Wouldn't this problem, as you already hint at, be solved if we simply 
>> change the inotify condition to wait for IN_CLOSE_WRITE? That sound 
>> like a better solution than introducing a retry loop here...
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 04/20/2017 08:14 AM, Minh Chau wrote:
>>> At startup, osaftransportd waits for osafdtmd.pid file creation
>>> and then reads dtm pid. If osafdtmd.pid has not been completedly
>>> created but osaftransportd still receives IN_CREATE, osaftransported
>>> will fail to read pid of dtmd. That results in a node reboot with
>>> a reason as "osafdtmd failed to start".
>>>
>>> To ensure that osaftransportd reads a valid pid, osaftransportd
>>> should also watch IN_CLOSE which is the event that dtmd closes
>>> osafdtmd.pid. That change will also affect to how nid is using
>>> file_notify; therefore, this patch adds a retry on reading pid
>>> of dtmd if the estabished stream to read osafdtmd.pid is not ready.
>>> ---
>>>   src/dtm/transport/transport_monitor.cc | 20 +++++++++++++++++---
>>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/dtm/transport/transport_monitor.cc 
>>> b/src/dtm/transport/transport_monitor.cc
>>> index 26c0b44..635ff00 100644
>>> --- a/src/dtm/transport/transport_monitor.cc
>>> +++ b/src/dtm/transport/transport_monitor.cc
>>> @@ -45,10 +45,24 @@ pid_t TransportMonitor::WaitForDaemon(const 
>>> std::string& daemon_name,
>>>         pidfile, user_fds, seconds_to_wait * kMillisPerSec);
>>>       if (rc == base::FileNotify::FileNotifyErrors::kOK) {
>>> -    if (!(std::ifstream{pidfile} >> pid).fail()) {
>>> -      if (IsDir(proc_path_ + std::to_string(pid))) {
>>> -        return pid;
>>> +    std::ifstream::iostate istate = std::ifstream::goodbit;
>>> +    int retry_cnt = 0;
>>> +    int retry_max = 5;
>>> +    do {
>>> +      if (retry_cnt > 0) {
>>> +        osaf_nanosleep(&kHundredMilliseconds);
>>>         }
>>> +      std::ifstream is{pidfile};
>>> +      if (!is.fail()) istate = (is >> pid).rdstate();
>>> +    } while ((retry_cnt++ < retry_max) && (pid == pid_t{-1}));
>>> +
>>> +    if (pid == pid_t{-1}) {
>>> +      LOG_WA("Failed to read pid from file %s, rdstate:0x%x",
>>> +              pidfile.c_str(), static_cast<int>(istate));
>>> +      return pid_t{-1};
>>> +    }
>>> +    if (IsDir(proc_path_ + std::to_string(pid))) {
>>> +        return pid;
>>>       }
>>>     }
>>>     return pid_t{-1};
>>
>>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to