Another approach could be to make sure that the pid file is created
atomically, e.g. by writing to a temporary file, closing it, and then
renaming it. Then we need to use inotify to wait for file rename operations.
regards,
Anders Widell
On 04/21/2017 04:12 PM, minh chau wrote:
> Hi Anders,
>
> A new method like WaitForFileCloseWrite() with IN_CLOSE_WRITE can
> leave nid unchanged, I can try this out.
>
> Thanks,
> Minh
>
> On 22/04/17 00:03, Anders Widell wrote:
>> Woundn't it be enough to wait for IN_CLOSE_WRITE only? Maybe we can
>> introduce a new method in FileNotify, instead of extending
>> WaitForFileCreation()?
>>
>> / Anders Widell
>>
>>
>> On 04/21/2017 03:49 PM, minh chau wrote:
>>> Hi Anders,
>>>
>>> It is because WaitForFileCreation() is also called in nid, where nid
>>> does not really need to know IN_CLOSE.
>>> To introduce IN_CLOSE in file_notify.cc, then we have to change nid
>>> to only wait for IN_CREATE, and transportd waits for IN_CREATE |
>>> IN_CLOSE.
>>> If this sounds ok to you, I can make a V2 for the above change?
>>>
>>> thanks,
>>> Minh
>>>
>>> On 21/04/17 23:27, Anders Widell wrote:
>>>> Hi!
>>>>
>>>> Wouldn't this problem, as you already hint at, be solved if we
>>>> simply change the inotify condition to wait for IN_CLOSE_WRITE?
>>>> That sound like a better solution than introducing a retry loop
>>>> here...
>>>>
>>>> regards,
>>>>
>>>> Anders Widell
>>>>
>>>>
>>>> On 04/20/2017 08:14 AM, Minh Chau wrote:
>>>>> At startup, osaftransportd waits for osafdtmd.pid file creation
>>>>> and then reads dtm pid. If osafdtmd.pid has not been completedly
>>>>> created but osaftransportd still receives IN_CREATE, osaftransported
>>>>> will fail to read pid of dtmd. That results in a node reboot with
>>>>> a reason as "osafdtmd failed to start".
>>>>>
>>>>> To ensure that osaftransportd reads a valid pid, osaftransportd
>>>>> should also watch IN_CLOSE which is the event that dtmd closes
>>>>> osafdtmd.pid. That change will also affect to how nid is using
>>>>> file_notify; therefore, this patch adds a retry on reading pid
>>>>> of dtmd if the estabished stream to read osafdtmd.pid is not ready.
>>>>> ---
>>>>> src/dtm/transport/transport_monitor.cc | 20 +++++++++++++++++---
>>>>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/src/dtm/transport/transport_monitor.cc
>>>>> b/src/dtm/transport/transport_monitor.cc
>>>>> index 26c0b44..635ff00 100644
>>>>> --- a/src/dtm/transport/transport_monitor.cc
>>>>> +++ b/src/dtm/transport/transport_monitor.cc
>>>>> @@ -45,10 +45,24 @@ pid_t TransportMonitor::WaitForDaemon(const
>>>>> std::string& daemon_name,
>>>>> pidfile, user_fds, seconds_to_wait * kMillisPerSec);
>>>>> if (rc == base::FileNotify::FileNotifyErrors::kOK) {
>>>>> - if (!(std::ifstream{pidfile} >> pid).fail()) {
>>>>> - if (IsDir(proc_path_ + std::to_string(pid))) {
>>>>> - return pid;
>>>>> + std::ifstream::iostate istate = std::ifstream::goodbit;
>>>>> + int retry_cnt = 0;
>>>>> + int retry_max = 5;
>>>>> + do {
>>>>> + if (retry_cnt > 0) {
>>>>> + osaf_nanosleep(&kHundredMilliseconds);
>>>>> }
>>>>> + std::ifstream is{pidfile};
>>>>> + if (!is.fail()) istate = (is >> pid).rdstate();
>>>>> + } while ((retry_cnt++ < retry_max) && (pid == pid_t{-1}));
>>>>> +
>>>>> + if (pid == pid_t{-1}) {
>>>>> + LOG_WA("Failed to read pid from file %s, rdstate:0x%x",
>>>>> + pidfile.c_str(), static_cast<int>(istate));
>>>>> + return pid_t{-1};
>>>>> + }
>>>>> + if (IsDir(proc_path_ + std::to_string(pid))) {
>>>>> + return pid;
>>>>> }
>>>>> }
>>>>> return pid_t{-1};
>>>>
>>>>
>>>
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel