> On April 11, 2016, 8:10 p.m., Alejandro Fernandez wrote:
> > ambari-common/src/main/python/resource_management/libraries/functions/fcntl_based_process_lock.py,
> >  line 29
> > <https://reviews.apache.org/r/45872/diff/3/?file=1339352#file1339352line29>
> >
> >     Can we make this more robust by catching the exception and logging it.
> 
> Alejandro Fernandez wrote:
>     I believe the code should be resilient in all cases for all OS'es. Fail 
> gracefully and allow the operations to continue, even if they are in 
> parallel, so they can be retried.

Found that dpkg relates on fcntl for locking on /var/lib/dpkg/lock. So 
implicitly Ambari relates on fcntl support.

There is a ticket about dpkg and NFS: 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=149491 and it's closed as 
won't fix ("these reports are unreasonable wishes")
It's and old report, to check if this is still an issue I tried to reproduce an 
fcntl fail on locking /var/lib/dpkg/lock. I tried this with NFSv3 and stopped 
nfslock (statd and rpc.lockd) but fcntl calls do not fail.

So I downloaded dpkg source on Ubuntu 14.04 (apt-get source dpkg), changed code 
related to the fcntl call to imitate that it failed:
dpkg-1.17.5ubuntu5.5/lib/dpkg/file.c:146 (void file_lock)
```
        if (fcntl(*lockfd, lock_cmd, &fl) == -1) {
                if (errno == EACCES || errno == EAGAIN)
                        ohshit(_("%s is locked by another process"), desc);
                else
                        ohshite(_("unable to lock %s"), desc);
        }

changed this to 

        errno = ENOLCK;
        ohshite(_("!!! unable to lock %s"), desc);
```
compiled it using ```apt-get source --compile  dpkg``` and reinstalled dpkg.
Now "dpkg --install" fails with 
```
root@u1402:~# ./usr/bin/dpkg --install
dpkg: error: !!! unable to lock dpkg status database: No locks available
```

Don't think it's a good practice to ignore fcntl failures because it indicates 
that the lock is broken. But for parrallel execution with retries it may be 
acceptable.


- Andrii


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45872/#review128233
-----------------------------------------------------------


On April 11, 2016, 2:33 p.m., Andrew Onischuk wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45872/
> -----------------------------------------------------------
> 
> (Updated April 11, 2016, 2:33 p.m.)
> 
> 
> Review request for Ambari, Sumit Mohanty and Vitalyi Brodetskyi.
> 
> 
> Bugs: AMBARI-15762
>     https://issues.apache.org/jira/browse/AMBARI-15762
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> **Problem**  
> Ambari executes component installs in parallel on a host. Each install process
> in the post-install phase executes certain shared/common configuration
> directory related steps (stacks/HDP/2.0.6/hooks/after-
> INSTALL/scripts/shared_initialization.py:link_configs() -> ... -> /resource_ma
> nagement/libraries/functions/conf_select.py:convert_conf_directories_to_symlin
> ks() ) on the directories listed in conf_select.py:PACKAGE_DIRS
> 
> The common configuration directory related processing fail in they are kicked
> of in the same time for the same directory.
> 
> In the case the following code snippet is executed in the same time by
> multiple component installs  
> each will see the `backup_dir` as non existent and will try to execute the
> "cp" command. One will succeed while the others fail.
> 
>     
>     
>     
>     Execute(("cp", "-R", "-p", old_conf, backup_dir),  not_if = format("test 
> -e {backup_dir}"), sudo = True)
>     
> 
> Similar behaviour is seen for the following code snippet:
> 
>     
>     
>     
>     Execute(as_sudo(["cp", "-R", "-p", os.path.join(old_conf, "*"), 
> versioned_conf], auto_escape=False), only_if = format("ls -d {old_conf}/*"))
>     
> 
> **Possible solution**  
> Use the linux "lockfile" command or "mkdir /var/lock/mylock" (see example of
> the mkdir solution here: <http://wiki.bash-hackers.org/howto/mutex>)
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/CustomServiceOrchestrator.py 
> 1bc045c 
>   ambari-agent/src/test/python/ambari_agent/TestCustomServiceOrchestrator.py 
> 884612b 
>   
> ambari-agent/src/test/python/resource_management/TestFcntlBasedProcessLock.py 
> PRE-CREATION 
>   
> ambari-common/src/main/python/resource_management/libraries/functions/fcntl_based_process_lock.py
>  PRE-CREATION 
>   
> ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/after-INSTALL/scripts/params.py
>  9f4971d 
>   
> ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks/after-INSTALL/scripts/shared_initialization.py
>  c772ddb 
> 
> Diff: https://reviews.apache.org/r/45872/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>

Reply via email to