onitake opened a new issue #3180: savepassword.sh crashes router when starting 
many VMs at once
URL: https://github.com/apache/cloudstack/issues/3180
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   VR
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master 
branch.
   -->
   
   ~~~
   All versions since 488625b1937eeb38f9a29706b6e7333043ae3e6e
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   N/A
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   We discovered a serious bug in one of the System VM scripts: 
/systemvm/debian/opt/cloud/bin/savepassword.sh
   
   The way I understand this script, it is supposed to clear out stored 
passwords for the password reset API by sending "ack" requests 
(="saved_password") to the password reset server running locally.
   
   But instead of just sending a single request per VM, it tries to execute 
curl for *all* IP addresses assigned to the VR, and on top of that, this 
happens *every* time a VM is launched.
   
   If there are only a small number of VMs launched concurrently or only a 
handful of NAT IPs configured on this router, the number of requests is small. 
Otherwise, there will be many curl processes waiting for a reply. These may 
even lock the router up due to process/memory contention (or some sort of 
deadlock). It's not entirely clear to me why curl would linger instead 
fast-failing.
   
   It seems this was introduced in CLOUDSTACK-8331 and I don't understand why 
the request can't be simply sent to http://127.0.0.1:8080 *once*?
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   1. Set up an isolated network with NAT
   2. Allocate many NAT addresses on this network (ex. more than 100)
   3. Install many VMs and attach them to this network (ex. more than 100)
   4. Configure static NAT for each VM
   5. Stop and start all VMs at the same time
   <!-- Paste example playbooks or commands between quotes below -->
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   The VMs start and are available on the network.
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   The virtual router locks up/crashes/reboots.
   In older CloudStack VR versions, this even led to a complete network lockup 
(due to the router starting without configuration). In the Debian 9 version, it 
will probably simply reboot, causing a temporary outage, but possibly other 
problems.
   <!-- Paste verbatim command output between quotes below -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to