[jira] [Commented] (CLOUDSTACK-9583) VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666210#comment-15666210
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9583:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/1757
  
Trillian test result (tid-337)
Environment: xenserver-65sp1 (x2), Advanced Networking with Mgmt server 6
Total time taken: 30772 seconds
Marvin logs: 
https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1757-t337-xenserver-65sp1.zip
Test completed. 39 look ok, 4 have error(s)


Test | Result | Time (s) | Test File
--- | --- | --- | ---
test_05_rvpc_multi_tiers | `Failure` | 420.58 | test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics | `Failure` | 1372.61 | 
test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | `Failure` | 479.80 
| test_vpc_redundant.py
test_04_rvpc_privategw_static_routes | `Failure` | 619.78 | 
test_privategw_acl.py
ContextSuite context=TestRVPCSite2SiteVpn>:setup | `Error` | 0.00 | 
test_vpc_vpn.py
test_06_download_detached_volume | `Error` | 30.43 | test_volumes.py
test_01_vpc_site2site_vpn | Success | 326.97 | test_vpc_vpn.py
test_01_vpc_remote_access_vpn | Success | 121.74 | test_vpc_vpn.py
test_02_VPC_default_routes | Success | 271.09 | test_vpc_router_nics.py
test_01_VPC_nics_after_destroy | Success | 638.24 | test_vpc_router_nics.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers | 
Success | 777.50 | test_vpc_redundant.py
test_02_redundant_VPC_default_routes | Success | 984.30 | 
test_vpc_redundant.py
test_09_delete_detached_volume | Success | 15.75 | test_volumes.py
test_08_resize_volume | Success | 85.99 | test_volumes.py
test_07_resize_fail | Success | 91.06 | test_volumes.py
test_05_detach_volume | Success | 100.28 | test_volumes.py
test_04_delete_attached_volume | Success | 10.27 | test_volumes.py
test_03_download_attached_volume | Success | 15.38 | test_volumes.py
test_02_attach_volume | Success | 10.71 | test_volumes.py
test_01_create_volume | Success | 427.98 | test_volumes.py
test_03_delete_vm_snapshots | Success | 280.78 | test_vm_snapshots.py
test_02_revert_vm_snapshots | Success | 186.62 | test_vm_snapshots.py
test_01_create_vm_snapshots | Success | 133.90 | test_vm_snapshots.py
test_deploy_vm_multiple | Success | 294.33 | test_vm_life_cycle.py
test_deploy_vm | Success | 0.03 | test_vm_life_cycle.py
test_advZoneVirtualRouter | Success | 0.02 | test_vm_life_cycle.py
test_10_attachAndDetach_iso | Success | 31.87 | test_vm_life_cycle.py
test_09_expunge_vm | Success | 125.17 | test_vm_life_cycle.py
test_08_migrate_vm | Success | 66.24 | test_vm_life_cycle.py
test_07_restore_vm | Success | 0.14 | test_vm_life_cycle.py
test_06_destroy_vm | Success | 10.18 | test_vm_life_cycle.py
test_03_reboot_vm | Success | 10.20 | test_vm_life_cycle.py
test_02_start_vm | Success | 15.22 | test_vm_life_cycle.py
test_01_stop_vm | Success | 30.30 | test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName | Success | 126.15 | test_templates.py
test_08_list_system_templates | Success | 0.03 | test_templates.py
test_07_list_public_templates | Success | 0.04 | test_templates.py
test_05_template_permissions | Success | 0.06 | test_templates.py
test_04_extract_template | Success | 5.17 | test_templates.py
test_03_delete_template | Success | 5.12 | test_templates.py
test_02_edit_template | Success | 90.13 | test_templates.py
test_01_create_template | Success | 80.84 | test_templates.py
test_10_destroy_cpvm | Success | 221.74 | test_ssvm.py
test_09_destroy_ssvm | Success | 204.12 | test_ssvm.py
test_08_reboot_cpvm | Success | 141.60 | test_ssvm.py
test_07_reboot_ssvm | Success | 153.97 | test_ssvm.py
test_06_stop_cpvm | Success | 136.74 | test_ssvm.py
test_05_stop_ssvm | Success | 174.01 | test_ssvm.py
test_04_cpvm_internals | Success | 1.10 | test_ssvm.py
test_03_ssvm_internals | Success | 3.54 | test_ssvm.py
test_02_list_cpvm_vm | Success | 0.12 | test_ssvm.py
test_01_list_sec_storage_vm | Success | 0.13 | test_ssvm.py
test_01_snapshot_root_disk | Success | 16.69 | test_snapshots.py
test_04_change_offering_small | Success | 58.90 | test_service_offerings.py
test_03_delete_service_offering | Success | 0.05 | test_service_offerings.py
test_02_edit_service_offering | Success | 0.12 | test_service_offerings.py
test_01_create_service_offering | Success | 0.10 | test_service_offerings.py
test_02_sys_template_ready | Success | 0.13 | test_secondary_storage.py
test_01_sys_vm_start | Success | 0.20 | test_secondary_storage.py
test_01_scale_vm | Success | 5.24 | test_scale_vm.py
test_09_reboot_router | 

[jira] [Commented] (CLOUDSTACK-9583) VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666156#comment-15666156
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9583:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/1757
  
Trillian test result (tid-339)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 29505 seconds
Marvin logs: 
https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1757-t339-vmware-55u3.zip
Test completed. 42 look ok, 1 have error(s)


Test | Result | Time (s) | Test File
--- | --- | --- | ---
test_01_vpc_site2site_vpn | `Error` | 426.66 | test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn | `Error` | 669.50 | test_vpc_vpn.py
test_01_vpc_remote_access_vpn | Success | 142.18 | test_vpc_vpn.py
test_02_VPC_default_routes | Success | 299.75 | test_vpc_router_nics.py
test_01_VPC_nics_after_destroy | Success | 661.29 | test_vpc_router_nics.py
test_05_rvpc_multi_tiers | Success | 494.72 | test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics | Success | 1513.28 | 
test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers | 
Success | 571.88 | test_vpc_redundant.py
test_02_redundant_VPC_default_routes | Success | 511.48 | 
test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Success | 1135.81 | 
test_vpc_redundant.py
test_09_delete_detached_volume | Success | 31.73 | test_volumes.py
test_06_download_detached_volume | Success | 70.64 | test_volumes.py
test_05_detach_volume | Success | 110.39 | test_volumes.py
test_04_delete_attached_volume | Success | 15.24 | test_volumes.py
test_03_download_attached_volume | Success | 20.38 | test_volumes.py
test_02_attach_volume | Success | 58.72 | test_volumes.py
test_01_create_volume | Success | 455.56 | test_volumes.py
test_03_delete_vm_snapshots | Success | 275.26 | test_vm_snapshots.py
test_02_revert_vm_snapshots | Success | 194.06 | test_vm_snapshots.py
test_01_test_vm_volume_snapshot | Success | 156.69 | test_vm_snapshots.py
test_01_create_vm_snapshots | Success | 129.77 | test_vm_snapshots.py
test_deploy_vm_multiple | Success | 238.87 | test_vm_life_cycle.py
test_deploy_vm | Success | 0.03 | test_vm_life_cycle.py
test_advZoneVirtualRouter | Success | 0.02 | test_vm_life_cycle.py
test_10_attachAndDetach_iso | Success | 26.90 | test_vm_life_cycle.py
test_09_expunge_vm | Success | 125.28 | test_vm_life_cycle.py
test_08_migrate_vm | Success | 66.47 | test_vm_life_cycle.py
test_07_restore_vm | Success | 0.09 | test_vm_life_cycle.py
test_06_destroy_vm | Success | 10.20 | test_vm_life_cycle.py
test_03_reboot_vm | Success | 5.23 | test_vm_life_cycle.py
test_02_start_vm | Success | 20.38 | test_vm_life_cycle.py
test_01_stop_vm | Success | 10.30 | test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName | Success | 257.55 | test_templates.py
test_08_list_system_templates | Success | 0.03 | test_templates.py
test_07_list_public_templates | Success | 0.04 | test_templates.py
test_05_template_permissions | Success | 0.07 | test_templates.py
test_04_extract_template | Success | 15.24 | test_templates.py
test_03_delete_template | Success | 5.10 | test_templates.py
test_02_edit_template | Success | 90.14 | test_templates.py
test_01_create_template | Success | 111.24 | test_templates.py
test_10_destroy_cpvm | Success | 266.90 | test_ssvm.py
test_09_destroy_ssvm | Success | 238.71 | test_ssvm.py
test_08_reboot_cpvm | Success | 126.62 | test_ssvm.py
test_07_reboot_ssvm | Success | 128.49 | test_ssvm.py
test_06_stop_cpvm | Success | 207.77 | test_ssvm.py
test_05_stop_ssvm | Success | 174.68 | test_ssvm.py
test_04_cpvm_internals | Success | 1.22 | test_ssvm.py
test_03_ssvm_internals | Success | 4.42 | test_ssvm.py
test_02_list_cpvm_vm | Success | 0.12 | test_ssvm.py
test_01_list_sec_storage_vm | Success | 0.18 | test_ssvm.py
test_01_snapshot_root_disk | Success | 26.71 | test_snapshots.py
test_04_change_offering_small | Success | 92.16 | test_service_offerings.py
test_03_delete_service_offering | Success | 0.03 | test_service_offerings.py
test_02_edit_service_offering | Success | 0.08 | test_service_offerings.py
test_01_create_service_offering | Success | 0.10 | test_service_offerings.py
test_02_sys_template_ready | Success | 0.11 | test_secondary_storage.py
test_01_sys_vm_start | Success | 0.16 | test_secondary_storage.py
test_09_reboot_router | Success | 105.96 | test_routers.py
test_08_start_router | Success | 100.77 | test_routers.py
test_07_stop_router | Success | 20.20 | test_routers.py
test_06_router_advanced | Success | 0.04 | 

[jira] [Commented] (CLOUDSTACK-9588) Add Load Balancer functionality in Network page is redundant.

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666113#comment-15666113
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9588:


Github user nitin-maharana commented on the issue:

https://github.com/apache/cloudstack/pull/1758
  
The Add Load Balancer tab was removed.


![image](https://cloud.githubusercontent.com/assets/12583725/20293745/f1a66b9a-ab1e-11e6-9707-40af38637447.png)

The same functionality is done by Load Balancing tab.

https://cloud.githubusercontent.com/assets/12583725/20293902/26cd3f1e-ab20-11e6-9b59-05ac6ec8194b.png;>



> Add Load Balancer functionality in Network page is redundant.
> -
>
> Key: CLOUDSTACK-9588
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9588
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: Nitin Kumar Maharana
>
> Steps to Reproduce:
> Network -> Select any network -> Observer Add Load Balancer tab
> The "Add Load Balancer" functionality is redundant.
> The above is used to create LB rule without any public IP.
> Resolution:
> There exist similar functionality in Network -> Any Network -> Details Tab -> 
> View IP Addresses -> Any public IP -> Configuration Tab -> Observe Load 
> Balancing.
> The above is used to create LB rule with a public IP. This is a more 
> convenient way of creating LB rule as the IP is involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9583) VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665988#comment-15665988
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9583:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/1757
  
Trillian test result (tid-338)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 24043 seconds
Marvin logs: 
https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1757-t338-kvm-centos7.zip
Test completed. 42 look ok, 1 have error(s)


Test | Result | Time (s) | Test File
--- | --- | --- | ---
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | `Failure` | 396.62 
| test_vpc_redundant.py
test_01_vpc_site2site_vpn | Success | 150.37 | test_vpc_vpn.py
test_01_vpc_remote_access_vpn | Success | 56.14 | test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn | Success | 257.22 | test_vpc_vpn.py
test_02_VPC_default_routes | Success | 291.44 | test_vpc_router_nics.py
test_01_VPC_nics_after_destroy | Success | 502.74 | test_vpc_router_nics.py
test_05_rvpc_multi_tiers | Success | 530.07 | test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics | Success | 1443.95 | 
test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers | 
Success | 576.62 | test_vpc_redundant.py
test_02_redundant_VPC_default_routes | Success | 758.56 | 
test_vpc_redundant.py
test_09_delete_detached_volume | Success | 15.46 | test_volumes.py
test_08_resize_volume | Success | 15.38 | test_volumes.py
test_07_resize_fail | Success | 20.55 | test_volumes.py
test_06_download_detached_volume | Success | 15.30 | test_volumes.py
test_05_detach_volume | Success | 100.36 | test_volumes.py
test_04_delete_attached_volume | Success | 10.21 | test_volumes.py
test_03_download_attached_volume | Success | 15.45 | test_volumes.py
test_02_attach_volume | Success | 43.71 | test_volumes.py
test_01_create_volume | Success | 712.26 | test_volumes.py
test_deploy_vm_multiple | Success | 254.22 | test_vm_life_cycle.py
test_deploy_vm | Success | 0.03 | test_vm_life_cycle.py
test_advZoneVirtualRouter | Success | 0.02 | test_vm_life_cycle.py
test_10_attachAndDetach_iso | Success | 26.66 | test_vm_life_cycle.py
test_09_expunge_vm | Success | 125.29 | test_vm_life_cycle.py
test_08_migrate_vm | Success | 41.43 | test_vm_life_cycle.py
test_07_restore_vm | Success | 0.13 | test_vm_life_cycle.py
test_06_destroy_vm | Success | 126.01 | test_vm_life_cycle.py
test_03_reboot_vm | Success | 126.01 | test_vm_life_cycle.py
test_02_start_vm | Success | 10.23 | test_vm_life_cycle.py
test_01_stop_vm | Success | 40.42 | test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName | Success | 65.72 | test_templates.py
test_08_list_system_templates | Success | 0.03 | test_templates.py
test_07_list_public_templates | Success | 0.03 | test_templates.py
test_05_template_permissions | Success | 0.05 | test_templates.py
test_04_extract_template | Success | 5.28 | test_templates.py
test_03_delete_template | Success | 5.51 | test_templates.py
test_02_edit_template | Success | 90.16 | test_templates.py
test_01_create_template | Success | 65.67 | test_templates.py
test_10_destroy_cpvm | Success | 131.43 | test_ssvm.py
test_09_destroy_ssvm | Success | 163.33 | test_ssvm.py
test_08_reboot_cpvm | Success | 101.52 | test_ssvm.py
test_07_reboot_ssvm | Success | 103.22 | test_ssvm.py
test_06_stop_cpvm | Success | 101.46 | test_ssvm.py
test_05_stop_ssvm | Success | 133.18 | test_ssvm.py
test_04_cpvm_internals | Success | 1.02 | test_ssvm.py
test_03_ssvm_internals | Success | 3.99 | test_ssvm.py
test_02_list_cpvm_vm | Success | 0.11 | test_ssvm.py
test_01_list_sec_storage_vm | Success | 0.12 | test_ssvm.py
test_01_snapshot_root_disk | Success | 12.48 | test_snapshots.py
test_04_change_offering_small | Success | 209.76 | test_service_offerings.py
test_03_delete_service_offering | Success | 0.03 | test_service_offerings.py
test_02_edit_service_offering | Success | 0.05 | test_service_offerings.py
test_01_create_service_offering | Success | 0.10 | test_service_offerings.py
test_02_sys_template_ready | Success | 0.16 | test_secondary_storage.py
test_01_sys_vm_start | Success | 0.24 | test_secondary_storage.py
test_09_reboot_router | Success | 35.33 | test_routers.py
test_08_start_router | Success | 30.32 | test_routers.py
test_07_stop_router | Success | 10.28 | test_routers.py
test_06_router_advanced | Success | 0.06 | test_routers.py
test_05_router_basic | Success | 0.05 | test_routers.py
test_04_restart_network_wo_cleanup | Success | 5.65 | test_routers.py
test_03_restart_network_cleanup 

[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665577#comment-15665577
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@serg38 with custom plugins, there is no way to reliably perform such 
tracing.  I can think of batch cleanup operations in the storage layer that 
follow the pattern I described.  Even if there were, we would have planted a 
landline for future changes to the system.  Deadlocks are significant technical 
debt that are clearly causing significant operational issues.  Unfortunately, 
there is no way to address them generically


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665556#comment-15665556
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user serg38 commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@jburwell I concur but if @yvsubhash verified that those methods don't 
participate in complex DML transactions this might be still a good start. If so 
this approach might be expanded later to multi DML transaction so that each 
piece can be retired individually. I myself traced few deadlocks in ACS using  
native mysql deadlock logging and it doesn't seem there would be a viable 
alternative to retires due to well known complexity of ACS DB operations.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665499#comment-15665499
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@serg38 there remains a risk when those methods are executed in the context 
of an open transaction where DMLs have already been executed and subsequent 
DMLs will be executed.  In this scenario, the first set of the changes would be 
lost due to the rollback triggered by the query deadlock with the second set 
proceeding successfully.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665463#comment-15665463
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user serg38 commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@jburwell @yvsubhash  I might be wrong but this PR will retry on deadlock 
for only 2  DAO methods searchIncludingRemoved and 
customSearchIncludingRemoved. No update methods are set with this retry 
mechanism. If that's the case there is no risk of corrupting DB. 


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9560) Root volume of deleted VM left unremoved

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665362#comment-15665362
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9560:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1726#discussion_r87917132
  
--- Diff: server/src/com/cloud/storage/StorageManagerImpl.java ---
@@ -2199,15 +2199,20 @@ public void cleanupDownloadUrls(){
 if(downloadUrlCurrentAgeInSecs < 
_downloadUrlExpirationInterval){  // URL hasnt expired yet
 continue;
 }
-
-s_logger.debug("Removing download url " + 
volumeOnImageStore.getExtractUrl() + " for volume id " + 
volumeOnImageStore.getVolumeId());
+long volumeId = volumeOnImageStore.getVolumeId();
+s_logger.debug("Removing download url " + 
volumeOnImageStore.getExtractUrl() + " for volume id " + volumeId);
 
 // Remove it from image store
 ImageStoreEntity secStore = (ImageStoreEntity) 
_dataStoreMgr.getDataStore(volumeOnImageStore.getDataStoreId(), 
DataStoreRole.Image);
 
secStore.deleteExtractUrl(volumeOnImageStore.getInstallPath(), 
volumeOnImageStore.getExtractUrl(), Upload.Type.VOLUME);
 
 // Now expunge it from DB since this entry was created 
only for download purpose
 _volumeStoreDao.expunge(volumeOnImageStore.getId());
+Volume volume = _volumeDao.findById(volumeId);
+if (volume != null && volume.getState() == 
Volume.State.Expunged)
+{
+_volumeDao.remove(volumeId);
+}
--- End diff --

@yvsubhash have you had a chance to review @ustcweizhou's feedback?


> Root volume of deleted VM left unremoved
> 
>
> Key: CLOUDSTACK-9560
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9560
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Volumes
>Affects Versions: 4.8.0
> Environment: XenServer
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> In the following scenario root volume gets unremoved
> Steps to reproduce the issue
> 1. Create a VM.
> 2. Stop this VM.
> 3. On the page of the volume of the VM, click 'Download Volume' icon.
> 4. Wait for the popup screen to display and cancel out with/without clicking 
> the download link.
> 5. Destroy the VM
> Even after the corresponding VM is deleted,expunged, the root-volume is left 
> in 'Expunging' state unremoved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9570) Bug in listSnapshots for snapshots with deleted data stores

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665336#comment-15665336
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9570:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1735#discussion_r87915067
  
--- Diff: server/src/com/cloud/api/ApiResponseHelper.java ---
@@ -526,16 +529,18 @@ public static DataStoreRole getDataStoreRole(Snapshot 
snapshot, SnapshotDataStor
 }
 
 long storagePoolId = snapshotStore.getDataStoreId();
-DataStore dataStore = dataStoreMgr.getDataStore(storagePoolId, 
DataStoreRole.Primary);
+if (snapshotStore.getState() != null && ! 
snapshotStore.getState().equals(ObjectInDataStoreStateMachine.State.Destroyed)) 
{
+DataStore dataStore = dataStoreMgr.getDataStore(storagePoolId, 
DataStoreRole.Primary);
 
-Map mapCapabilities = 
dataStore.getDriver().getCapabilities();
+Map mapCapabilities = 
dataStore.getDriver().getCapabilities();
 
-if (mapCapabilities != null) {
-String value = 
mapCapabilities.get(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString());
-Boolean supportsStorageSystemSnapshots = new Boolean(value);
+if (mapCapabilities != null) {
+String value = 
mapCapabilities.get(DataStoreCapabilities.STORAGE_SYSTEM_SNAPSHOT.toString());
+Boolean supportsStorageSystemSnapshots = new 
Boolean(value);
--- End diff --

`new Boolean` skips the constant pool -- putting unnecessary pressure on 
the heap and creating a potential memory leak.  Please use `Boolean.valueOf` to 
part the value to avoid this issue.


> Bug in listSnapshots for snapshots with deleted data stores
> ---
>
> Key: CLOUDSTACK-9570
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9570
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: API
>Reporter: Nicolas Vazquez
>Assignee: Nicolas Vazquez
>
> h3. Actual behaviour
> If there is snapshot on a data store that is removed, {{listSnapshots}} still 
> tries to enumerate it and gives error (in this example data store 2 has been 
> removed):
> {code:xml|title=/client/api?command=listSnapshots=true=true|borderStyle=solid}
> 
>530
>4250
>Unable to locate datastore with id 2
> 
> {code}
> h3. Reproduce error
> This steps can be followed to reproduce issue:
> * Take a snapshot of a volume (this creates a references for primary storage 
> and secondary storage in snapshot_store_ref table
> * Simulate retiring primary data storage where snapshot is cached (in this 
> example X is a fake data store and Y is snapshot id):
> {{UPDATE `cloud`.`snapshot_store_ref` SET `store_id`='X', `state`="Destroyed" 
> WHERE `id`='Y';}}
> * List snapshots
> {{/client/api?command=listSnapshots=true=true}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9561) After domain/account deletion, snapshot taken by the domain/account remains undeleted

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665328#comment-15665328
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9561:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1737
  
@SudharmaJain this fix seems like it would be good for LTS users as well.  
Could you please change the base branch to 4.9?


> After domain/account deletion, snapshot taken by the domain/account remains 
> undeleted
> -
>
> Key: CLOUDSTACK-9561
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9561
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: sudharma jain
>
> While deleting the UserAccount Cleanup for the removed VMs/volumes are not 
> happening. For the removed VMs, snapshots doesn't get cleaned. Only for 
> volumes in ready state the cleanup happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9572) Snapshot on primary storage not cleaned up after Storage migration

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665316#comment-15665316
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9572:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1740#discussion_r87913074
  
--- Diff: server/src/com/cloud/storage/snapshot/SnapshotManagerImpl.java ---
@@ -,6 +,20 @@ public boolean canOperateOnVolume(Volume volume) {
 }
 
 @Override
+public void cleanupSnapshotsByVolume(Long volumeId) {
+List volSnapShots = 
_snapshotDao.listByVolumeId(volumeId);
+for(SnapshotVO snapshot: volSnapShots) {
+SnapshotInfo info = 
snapshotFactory.getSnapshot(snapshot.getId(), DataStoreRole.Primary);
+try {
+snapshotSrv.deleteSnapshot(info);
+} catch(CloudRuntimeException e) {
+String msg = "Cleanup of Snapshot with uuid " + 
snapshot.getUuid() + " in primary storage is failed. Ignoring";
--- End diff --

This local variable is only used once.  Please consider collapsing into 
lint 1122.  Also, please add the message from the exception to the message to 
provide greater detail for debugging efforts.


> Snapshot on primary storage not cleaned up after Storage migration
> --
>
> Key: CLOUDSTACK-9572
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9572
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Storage Controller
>Affects Versions: 4.8.0
> Environment: Xen Server
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Issue Description
> ===
> 1. Create an instance on the local storage on any host
> 2. Create a scheduled snapshot of the volume:
> 3. Wait until ACS created the snapshot. ACS is creating a snapshot on local 
> storage and is transferring this snapshot to secondary storage. But the 
> latest snapshot on local storage will stay there. This is as expected.
> 4. Migrate the instance to another XenServer host with ACS UI and Storage 
> Live Migration
> 5. The Snapshot on the old host on local storage will not be cleaned up and 
> is staying on local storage. So local storage will fill up with unneeded 
> snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9572) Snapshot on primary storage not cleaned up after Storage migration

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665315#comment-15665315
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9572:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1740#discussion_r87913578
  
--- Diff: server/src/com/cloud/storage/snapshot/SnapshotManagerImpl.java ---
@@ -,6 +,20 @@ public boolean canOperateOnVolume(Volume volume) {
 }
 
 @Override
+public void cleanupSnapshotsByVolume(Long volumeId) {
+List volSnapShots = 
_snapshotDao.listByVolumeId(volumeId);
+for(SnapshotVO snapshot: volSnapShots) {
+SnapshotInfo info = 
snapshotFactory.getSnapshot(snapshot.getId(), DataStoreRole.Primary);
--- End diff --

This appears to be an application side join.  Please consider creating a 
new query to retrieve all snapshot info instances associated with `volumeId` to 
reduce load on the database and simplify this method.


> Snapshot on primary storage not cleaned up after Storage migration
> --
>
> Key: CLOUDSTACK-9572
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9572
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Storage Controller
>Affects Versions: 4.8.0
> Environment: Xen Server
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Issue Description
> ===
> 1. Create an instance on the local storage on any host
> 2. Create a scheduled snapshot of the volume:
> 3. Wait until ACS created the snapshot. ACS is creating a snapshot on local 
> storage and is transferring this snapshot to secondary storage. But the 
> latest snapshot on local storage will stay there. This is as expected.
> 4. Migrate the instance to another XenServer host with ACS UI and Storage 
> Live Migration
> 5. The Snapshot on the old host on local storage will not be cleaned up and 
> is staying on local storage. So local storage will fill up with unneeded 
> snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665209#comment-15665209
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@serg38 that is not a safe assumption.  Transactions often span multiple 
statements and methods across DAOs.  `TransactionLegacy` has a transaction 
stacking/nested model that further occludes when a transaction actually 
completely.

Deadlocks are a severe problem that need to be fixed.  Unfortunately, this 
patch would do more harm than good as it would eventually corrupt the database. 
  In, and of themselves, retries are also a very expensive solution to the 
problem both in terms of the engineering effort required to do it properly and 
the extra stress placed on the database to perform additional work that will 
likely fail.  Furthermore, a generic **and** correct retry mechanism is a very 
difficult thing to write.  Given the way transaction boundaries are managed in 
ACS, I think such an effort would be nearly impossible.

In a properly written application, deadlocks should very rarely, if ever, 
occur.  Their presence is a symptom of improper transaction handling and/or 
poor lock management problems.   Therefore, my suggestion is that we change 
this patch to log details about the context in which deadlocks occur.  We can 
then use this information to identify the areas in ACS where these contention 
problems are location and fix the root cause.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9592) Empty responses from site to site connection status are not handled propertly

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665165#comment-15665165
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9592:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1761#discussion_r87900141
  
--- Diff: 
server/src/com/cloud/network/router/VirtualNetworkApplianceManagerImpl.java ---
@@ -962,18 +962,22 @@ protected void 
updateSite2SiteVpnConnectionState(final List rout
 }
 final Site2SiteVpnConnection.State oldState = 
conn.getState();
 final Site2SiteCustomerGateway gw = 
_s2sCustomerGatewayDao.findById(conn.getCustomerGatewayId());
-if (answer.isConnected(gw.getGatewayIp())) {
-
conn.setState(Site2SiteVpnConnection.State.Connected);
-} else {
-
conn.setState(Site2SiteVpnConnection.State.Disconnected);
-}
-_s2sVpnConnectionDao.persist(conn);
-if (oldState != conn.getState()) {
-final String title = "Site-to-site Vpn 
Connection to " + gw.getName() + " just switch from " + oldState + " to " + 
conn.getState();
-final String context = "Site-to-site Vpn 
Connection to " + gw.getName() + " on router " + router.getHostName() + "(id: " 
+ router.getId() + ") "
-+ " just switch from " + oldState + " 
to " + conn.getState();
-s_logger.info(context);
-
_alertMgr.sendAlert(AlertManager.AlertType.ALERT_TYPE_DOMAIN_ROUTER, 
router.getDataCenterId(), router.getPodIdToDeployIn(), title, context);
+
+if (answer.isIPPresent(gw.getGatewayIp())) {
+if (answer.isConnected(gw.getGatewayIp())) {
+
conn.setState(Site2SiteVpnConnection.State.Connected);
+} else {
+
conn.setState(Site2SiteVpnConnection.State.Disconnected);
+}
+_s2sVpnConnectionDao.persist(conn);
+if (oldState != conn.getState()) {
+final String title = "Site-to-site Vpn 
Connection to " + gw.getName() + " just switch from " + oldState + " to " + 
conn.getState();
--- End diff --

Minor nit: could you please fix the grammatical error in this error 
message?  It should read "~~just~~ switch**ed** from".


> Empty responses from site to site connection status are not handled propertly
> -
>
> Key: CLOUDSTACK-9592
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9592
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Network Controller
>Affects Versions: 4.8.0
> Environment: Any Hypervisor
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> vpn connection status gives responses like the below sometimes
> Processing: { Ans: , MgmtId: 7203499016310, via: 1(10.147.28.37), Ver: v1, 
> Flags: 110, 
> [{"com.cloud.agent.api.CheckS2SVpnConnectionsAnswer":{"ipToConnected":{},"ipToDetail":{},"details":"","result":true,"wait":0}}]
>  }
> 2016-09-27 08:52:19,211 DEBUG [c.c.a.t.Request] 
> (RouterStatusMonitor-1:ctx-c20f391d) (logid:c217239d) Seq 
> 1-2315413158421863581: Received: { Ans: , MgmtId: 7203499016310, via: 
> 1(10.147.28.37), Ver: v1, Flags: 110,
> { CheckS2SVpnConnectionsAnswer }
> In the above scenario, the bug in the processing of this response assumes the 
> connection is disconnected even though it is not disconnected and there would 
> be two consecutive alerts in logs as well as emails even though there is not 
> actual disconnection and reconnection
> Site-to-site Vpn Connection XYZ-VPN on router r-197-VM(id: 197) just switch 
> from Disconnected to Connected
> Site-to-site Vpn Connection to D1 site to site VPN on router r-372-VM(id: 
> 372) just switch from Connected to Disconnected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9592) Empty responses from site to site connection status are not handled propertly

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665164#comment-15665164
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9592:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1761#discussion_r87897999
  
--- Diff: core/src/com/cloud/agent/api/CheckS2SVpnConnectionsAnswer.java ---
@@ -76,4 +76,14 @@ public String getDetail(String ip) {
 }
 return null;
 }
+
+public boolean isIPPresent(String ip) {
+if (this.getResult()) {
+Boolean status = ipToConnected.get(ip);
+if (status != null) {
--- End diff --

Is the IP present if `status` is equal to `false`?


> Empty responses from site to site connection status are not handled propertly
> -
>
> Key: CLOUDSTACK-9592
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9592
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Network Controller
>Affects Versions: 4.8.0
> Environment: Any Hypervisor
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> vpn connection status gives responses like the below sometimes
> Processing: { Ans: , MgmtId: 7203499016310, via: 1(10.147.28.37), Ver: v1, 
> Flags: 110, 
> [{"com.cloud.agent.api.CheckS2SVpnConnectionsAnswer":{"ipToConnected":{},"ipToDetail":{},"details":"","result":true,"wait":0}}]
>  }
> 2016-09-27 08:52:19,211 DEBUG [c.c.a.t.Request] 
> (RouterStatusMonitor-1:ctx-c20f391d) (logid:c217239d) Seq 
> 1-2315413158421863581: Received: { Ans: , MgmtId: 7203499016310, via: 
> 1(10.147.28.37), Ver: v1, Flags: 110,
> { CheckS2SVpnConnectionsAnswer }
> In the above scenario, the bug in the processing of this response assumes the 
> connection is disconnected even though it is not disconnected and there would 
> be two consecutive alerts in logs as well as emails even though there is not 
> actual disconnection and reconnection
> Site-to-site Vpn Connection XYZ-VPN on router r-197-VM(id: 197) just switch 
> from Disconnected to Connected
> Site-to-site Vpn Connection to D1 site to site VPN on router r-372-VM(id: 
> 372) just switch from Connected to Disconnected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665125#comment-15665125
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user serg38 commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@jburwell I thought that most if not all of ACS interaction through DAO is 
rather atomic transactions. Do we have cases of multiple DML statements as a 
part of the same transaction? We have been seeing quite a few deadlock in a 
high transaction volume environments where multiple management servers are 
employed. This causes quite a pain for users due to the randomness and no good 
recourse/explanation. I would argue that proper retry is a better choice should 
we cover all the cases including all cases with complex transactions. We have 
been successful leveraging this approach in systems built on the top of ACS.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9589) vmName entries from host_details table for the VM's whose state is Expunging should be deleted during upgrade from older versions

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665086#comment-15665086
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9589:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1759
  
This change has been added to the `schema-480to481.sql` script.  Since 
4.8.1 has already shipped, this script will not be applied for those users.  
Therefore, this change needs to be placed in the `schema-481to4820.sql` script.

Also, the base branch for this PR is master.  However, the database change 
is targeted at 4.8.  Therefore, the base branch should be changed to 4.8.


> vmName entries from host_details table for the VM's whose state is Expunging 
> should be deleted during upgrade from older versions
> -
>
> Key: CLOUDSTACK-9589
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9589
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Baremetal
>Affects Versions: 4.4.4
> Environment: Baremetal zone
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Having vmName entries for VMs in 'expunging' states would cause with 
> deploying VMs with matching host tags fail. So removing them during upgrade



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9589) vmName entries from host_details table for the VM's whose state is Expunging should be deleted during upgrade from older versions

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665082#comment-15665082
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9589:


Github user jburwell commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1759#discussion_r87896137
  
--- Diff: setup/db/db/schema-480to481-cleanup.sql ---
@@ -18,3 +18,6 @@
 --;
 -- Schema cleanup from 4.8.0 to 4.8.1;
 --;
+
+DELETE FROM `cloud`.`host_details` where name = 'vmName' and  value in 
(select name from `cloud`.`vm_instance`  where state = 'Expunging' and 
hypervisor_type ='BareMetal');
--- End diff --

Why is this change scoped only to the baremetal hypervisor?  It would seem 
that it should apply to all hypervisors.


> vmName entries from host_details table for the VM's whose state is Expunging 
> should be deleted during upgrade from older versions
> -
>
> Key: CLOUDSTACK-9589
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9589
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Baremetal
>Affects Versions: 4.4.4
> Environment: Baremetal zone
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Having vmName entries for VMs in 'expunging' states would cause with 
> deploying VMs with matching host tags fail. So removing them during upgrade



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9593) User data check is inconsistent with python

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665071#comment-15665071
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9593:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1760
  
@marcaurele this change looks a good check to add to LTS to as well.  Could 
you please change the base branch to 4.9?  Once you do, I will kick regression 
tests across all hypervisors in order to merge the fix.


> User data check is inconsistent with python
> ---
>
> Key: CLOUDSTACK-9593
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9593
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.4.2, 4.4.3, 4.3.2, 4.5.1, 4.4.4, 4.5.2, 4.6.0, 4.6.1, 
> 4.6.2, 4.7.0, 4.7.1, 4.8.0, 4.9.0
>Reporter: Marc-Aurèle Brothier
>Assignee: Marc-Aurèle Brothier
>
> The user data is validated through the Apache commons codec library, but this 
> library does not check that the length is a multiple of 4 characters. The RFC 
> does not require it either. But the python script in the virtual router that 
> loads the user data does check for the possible padding presence, requiring 
> the string to be a multiple of 4 characters.
> {code:python}
> >>> import base64
> >>> base64.b64decode('foo')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py",
>  line 78, in b64decode
> raise TypeError(msg)
> TypeError: Incorrect padding
> >>> base64.b64decode('foo=')
> '~\x8a'
> {code}
> Currently since the java check is less restrictive, the user data gets saved 
> into the database but the VR script crashes when it receives this VM user 
> data. On a single VM it is not really a problem. The critical issue is when a 
> VR is restarted. The invalid pythonic base64 string makes the vmdata.py 
> script crashed, resulting in a VR not starting at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9588) Add Load Balancer functionality in Network page is redundant.

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665062#comment-15665062
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9588:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1758
  
@nitin-maharana could you please provide a screenshot of this change?


> Add Load Balancer functionality in Network page is redundant.
> -
>
> Key: CLOUDSTACK-9588
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9588
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: Nitin Kumar Maharana
>
> Steps to Reproduce:
> Network -> Select any network -> Observer Add Load Balancer tab
> The "Add Load Balancer" functionality is redundant.
> The above is used to create LB rule without any public IP.
> Resolution:
> There exist similar functionality in Network -> Any Network -> Details Tab -> 
> View IP Addresses -> Any public IP -> Configuration Tab -> Observe Load 
> Balancing.
> The above is used to create LB rule with a public IP. This is a more 
> convenient way of creating LB rule as the IP is involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9583) VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665061#comment-15665061
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9583:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/1757
  
@jburwell a Trillian-Jenkins matrix job (centos6 mgmt + xs65sp1, centos7 
mgmt + vmware55u3, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests


> VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1
> -
>
> Key: CLOUDSTACK-9583
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9583
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: Murali Reddy
> Fix For: 4.9.1.0
>
>
> It is observed that  'ip route flush' was timing out after 20 seconds with 
> the error that can't resolve the name of the vrouter. Since this is done for 
> each rule for a router with a lot of rules, adding the entry to hosts file 
> fixes it and the router provisioning is observed faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9583) VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665059#comment-15665059
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9583:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1757
  
@blueorangutan test matrix


> VR: In CsDhcp.py preseed both hostaname and localhost to resolve to 127.0.0.1
> -
>
> Key: CLOUDSTACK-9583
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9583
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: Murali Reddy
> Fix For: 4.9.1.0
>
>
> It is observed that  'ip route flush' was timing out after 20 seconds with 
> the error that can't resolve the name of the vrouter. Since this is done for 
> each rule for a router with a lot of rules, adding the entry to hosts file 
> fixes it and the router provisioning is observed faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665052#comment-15665052
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@serg38 my reading of the code is that only the most recently attempted DML 
will be re-executed.  Furthermore, retrying without refreshing the base data 
can also lead to data corruption.  The best thing to do in a case of a dead 
lock is to fail and rollback due to the risk of data corruption.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664656#comment-15664656
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user serg38 commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@jburwell @yvsubhash My understanding that all roll back statements will 
receive MYSQL_DEADLOCK_ERROR_CODE  and will be retired as a part of this patch.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9590) KVM + CentOS 7.2 + Agent in Alert State for long time

2016-11-14 Thread Sven Vogel (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664298#comment-15664298
 ] 

Sven Vogel commented on CLOUDSTACK-9590:


in Management Server i see always things like

{code}
2016-11-14 16:44:56,034 WARN  [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Monitor 
NetworkOrchestrator says there is an error in the connect process for 70 due to 
Unable to get an answer to the Check
NetworkCommand from agent: 70
2016-11-14 16:44:56,034 INFO  [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Host 70 is 
disconnecting with event AgentDisconnected
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) The next status of 
agent 70is Alert, current status is Connecting
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Deregistering link for 
70 with state Alert
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Remove Agent : 70
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.ConnectedAgentAttache] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Processing Disconnect.
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.storage.secondary.SecondaryStorageListener
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.storage.listener.StoragePoolMonitor
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.security.SecurityGroupListener
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.deploy.DeploymentPlanningManagerImpl
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.SshKeysDistriMonitor
2016-11-14 16:44:56,040 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.storage.LocalStoragePoolListener
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.capacity.StorageCapacityListener
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.capacity.ComputeCapacityListener
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.SshKeysDistriMonitor
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.storage.upload.UploadListener
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Sending Disconnect to 
listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2016-11-14 16:44:56,041 DEBUG [c.c.n.NetworkUsageManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) Disconnected called on 
70 with status Alert
2016-11-14 16:44:56,041 DEBUG [c.c.a.m.AgentManagerImpl] 
(AgentConnectTaskPool-15:ctx-f99e936f) (logid:6cdf713e) 

[jira] [Commented] (CLOUDSTACK-9595) Transactions are not getting retried in case of database deadlock errors

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664293#comment-15664293
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9595:


Github user jburwell commented on the issue:

https://github.com/apache/cloudstack/pull/1762
  
@yvsubhash according to the (MySQL deadlock 
documenation)[http://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html],  a 
`MYSQL_DEADLOCK_ERROR_CODE` error indicates the enclosing transaction has been 
rolled back.  The proper handling for this error is to re-execute all 
statements executed in the aborted transaction.  From a best practices 
perspective, all base data should be re-retrieved and changed to ensure logical 
consistency with changes made by the transaction that won deadlock resolution.

As I understand this patch, only the most recently executed DML is retried. 
 Therefore, any previously executed changes will be discarded and the DML will 
be re-executed either in a new transaction or in auto-commit (I didn't look up 
how the client handles the transaction context in this scenario).  If my 
understanding is correct, this patch could lead to issues ranging from 
unexpected foreign key integrity errors to data corruption.

Rather attempting to implement a generic retry, I think the best approach 
to addressing deadlocks is to treat them bugs.  This patch could be modified to 
provide detailed logging information about the conditions under which a 
deadlock occurs providing the information necessary to refactor the system to 
avoid lock contention.


> Transactions are not getting retried in case of database deadlock errors
> 
>
> Key: CLOUDSTACK-9595
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9595
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Affects Versions: 4.8.0
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Customer is seeing occasional error 'Deadlock found when trying to get lock; 
> try restarting transaction' messages in their management server logs.  It 
> happens regularly at least once a day.  The following is the error seen 
> 2015-12-09 19:23:19,450 ERROR [cloud.api.ApiServer] 
> (catalina-exec-3:ctx-f05c58fc ctx-39c17156 ctx-7becdf6e) unhandled exception 
> executing api command: [Ljava.lang.String;@230a6e7f
> com.cloud.utils.exception.CloudRuntimeException: DB Exception on: 
> com.mysql.jdbc.JDBC4PreparedStatement@74f134e3: DELETE FROM 
> instance_group_vm_map WHERE instance_group_vm_map.instance_id = 941374
>   at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1209)
>   at sun.reflect.GeneratedMethodAccessor360.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>   at 
> com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
>   at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>   at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>   at com.sun.proxy.$Proxy237.expunge(Unknown Source)
>   at 
> com.cloud.vm.UserVmManagerImpl$2.doInTransactionWithoutResult(UserVmManagerImpl.java:2593)
>   at 
> com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
>   at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:57)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:45)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:54)
>   at 
> com.cloud.vm.UserVmManagerImpl.addInstanceToGroup(UserVmManagerImpl.java:2575)
>   at 
> com.cloud.vm.UserVmManagerImpl.updateVirtualMachine(UserVmManagerImpl.java:2332)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CLOUDSTACK-9590) KVM + CentOS 7.2 + Agent in Alert State for long time

2016-11-14 Thread Sven Vogel (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664147#comment-15664147
 ] 

Sven Vogel edited comment on CLOUDSTACK-9590 at 11/14/16 3:11 PM:
--

1. first of all i add the host from cs management
2. Management Host {code} cloudstack-setup-agent  -m 192.168.85.25 -z 3 -p 3 -c 
9 -g 6e6cff15-3183-3cca-9389-ed1a78f6236a -a --pubNic=cloudbr2 
--prvNic=cloudbr0 --guestNic=cloudbr1 --hypervisor=kvm {code}
3. agent will be dead after add the host
4. restart agent
6. agent reconnect to server and wait with alert


was (Author: sven.vogel):
1. first of all i add the host from cs management
2. Management Host 
--  {code} cloudstack-setup-agent  -m 192.168.85.25 -z 3 -p 3 -c 9 -g 
6e6cff15-3183-3cca-9389-ed1a78f6236a -a --pubNic=cloudbr2 --prvNic=cloudbr0 
--guestNic=cloudbr1 --hypervisor=kvm {code}
3. agent will be dead after add the host
4. restart agent
6. agent reconnect to server and wait with alert

> KVM + CentOS 7.2 + Agent in Alert State for long time
> -
>
> Key: CLOUDSTACK-9590
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9590
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: cloudstack-agent
>Affects Versions: 4.9.0
> Environment: entOS Linux release 7.2.1511 (Core)
> cloudstack-agent-4.9.0-1.el7.centos.x86_64
>Reporter: Sven Vogel
> Attachments: agent.log, cloudstack-startup.log, management-server.zip
>
>
> Hi,
> When i add a new host to cloudstack management server it take some time to 
> get host out from alert state.
> 1. i add the host and host add not possible
> 2. values are correct set to agent.properties, restart cloustack agent
> 3. agent says connected to server
> 4. management server says "alert"
> management-server.log
> 2016-11-10 13:23:06,783 DEBUG [c.c.h.Status] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Transition:[Resource 
> state = Enabled, Agent event = AgentDisconnected, Host
> id = 51, name = kvm02.oscloud.local]
> 2016-11-10 13:23:06,798 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Notifying other nodes 
> of to disconnect
> 2016-11-10 13:23:06,806 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Failed to handle host 
> connection: com.cloud.exception.Connection
> Exception: Unable to get an answer to the CheckNetworkCommand from agent: 51
> is there any way to speed up the alert state? is it normal that it take so 
> long?
> thanks
> Sven



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9590) KVM + CentOS 7.2 + Agent in Alert State for long time

2016-11-14 Thread Sven Vogel (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664147#comment-15664147
 ] 

Sven Vogel commented on CLOUDSTACK-9590:


1. first of all i add the host from cs management
2. Management Host 
--  {code} cloudstack-setup-agent  -m 192.168.85.25 -z 3 -p 3 -c 9 -g 
6e6cff15-3183-3cca-9389-ed1a78f6236a -a --pubNic=cloudbr2 --prvNic=cloudbr0 
--guestNic=cloudbr1 --hypervisor=kvm {code}
3. agent will be dead after add the host
4. restart agent
6. agent reconnect to server and wait with alert

> KVM + CentOS 7.2 + Agent in Alert State for long time
> -
>
> Key: CLOUDSTACK-9590
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9590
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: cloudstack-agent
>Affects Versions: 4.9.0
> Environment: entOS Linux release 7.2.1511 (Core)
> cloudstack-agent-4.9.0-1.el7.centos.x86_64
>Reporter: Sven Vogel
> Attachments: agent.log, cloudstack-startup.log, management-server.zip
>
>
> Hi,
> When i add a new host to cloudstack management server it take some time to 
> get host out from alert state.
> 1. i add the host and host add not possible
> 2. values are correct set to agent.properties, restart cloustack agent
> 3. agent says connected to server
> 4. management server says "alert"
> management-server.log
> 2016-11-10 13:23:06,783 DEBUG [c.c.h.Status] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Transition:[Resource 
> state = Enabled, Agent event = AgentDisconnected, Host
> id = 51, name = kvm02.oscloud.local]
> 2016-11-10 13:23:06,798 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Notifying other nodes 
> of to disconnect
> 2016-11-10 13:23:06,806 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Failed to handle host 
> connection: com.cloud.exception.Connection
> Exception: Unable to get an answer to the CheckNetworkCommand from agent: 51
> is there any way to speed up the alert state? is it normal that it take so 
> long?
> thanks
> Sven



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9590) KVM + CentOS 7.2 + Agent in Alert State for long time

2016-11-14 Thread Sven Vogel (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15664111#comment-15664111
 ] 

Sven Vogel commented on CLOUDSTACK-9590:


no maintenance mode. the host are fresh installed and added to cs. after that 
they stay for lime in alert mode.

> KVM + CentOS 7.2 + Agent in Alert State for long time
> -
>
> Key: CLOUDSTACK-9590
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9590
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: cloudstack-agent
>Affects Versions: 4.9.0
> Environment: entOS Linux release 7.2.1511 (Core)
> cloudstack-agent-4.9.0-1.el7.centos.x86_64
>Reporter: Sven Vogel
> Attachments: agent.log, cloudstack-startup.log, management-server.zip
>
>
> Hi,
> When i add a new host to cloudstack management server it take some time to 
> get host out from alert state.
> 1. i add the host and host add not possible
> 2. values are correct set to agent.properties, restart cloustack agent
> 3. agent says connected to server
> 4. management server says "alert"
> management-server.log
> 2016-11-10 13:23:06,783 DEBUG [c.c.h.Status] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Transition:[Resource 
> state = Enabled, Agent event = AgentDisconnected, Host
> id = 51, name = kvm02.oscloud.local]
> 2016-11-10 13:23:06,798 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Notifying other nodes 
> of to disconnect
> 2016-11-10 13:23:06,806 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Failed to handle host 
> connection: com.cloud.exception.Connection
> Exception: Unable to get an answer to the CheckNetworkCommand from agent: 51
> is there any way to speed up the alert state? is it normal that it take so 
> long?
> thanks
> Sven



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9557) Deploy from VMsnapshot fails with exception if source template is removed or made private

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663798#comment-15663798
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9557:


Github user yvsubhash commented on the issue:

https://github.com/apache/cloudstack/pull/1721
  
@rhtyd  i will merge this to #1664 once the conflicts are resolved int the 
other one


> Deploy from VMsnapshot fails with exception if source template is removed or 
> made private
> -
>
> Key: CLOUDSTACK-9557
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9557
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Template
>Affects Versions: 4.8.0
> Environment: Any Hypervisor
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Steps to reproduce the issue
> i) Upload a template as admin user and make sure "public" is selected when 
> uploading it.
> ii) Now login as a user to CloudStack and deploy a VM with the template 
> created in step i).
> iii) Create a VM snapshot as the user for the VM in step ii). Once created 
> deploy a VM from the snapshot ( this will work as expected)
> iv) Now login as admin again , edit the template created in step i) and 
> Uncheck "public". This is make the template as private ( or else delete the 
> template from UI)
> v) Login as same user as in step ii) and try to create a VM from the same 
> snapshot ( created in step iii)). This will fail now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9572) Snapshot on primary storage not cleaned up after Storage migration

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663510#comment-15663510
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9572:


GitHub user yvsubhash reopened a pull request:

https://github.com/apache/cloudstack/pull/1740

CLOUDSTACK-9572 Snapshot on primary storage not cleaned up after Stor…

Snapshot on primary storage not cleaned up after Storage migration. This 
happens in the following scenario
## Steps To Reproduce
1. Create an instance on the local storage on any host
2. Create a scheduled snapshot of the volume:
3. Wait until ACS created the snapshot. ACS is creating a snapshot on local 
storage and is transferring this snapshot to secondary storage. But the latest 
snapshot on local storage will stay there. This is as expected.
4. Migrate the instance to another XenServer host with ACS UI and Storage 
Live Migration
5. The Snapshot on the old host on local storage will not be cleaned up and 
is staying on local storage. So local storage will fill up with unneeded 
snapshots.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yvsubhash/cloudstack CLOUDSTACK-9572

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/1740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1740


commit 13820fdae5a22573db1c964f02e37d232228b3d8
Author: subhash yedugundla 
Date:   2016-09-12T13:29:53Z

CLOUDSTACK-9572 Snapshot on primary storage not cleaned up after Storage 
migration




> Snapshot on primary storage not cleaned up after Storage migration
> --
>
> Key: CLOUDSTACK-9572
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9572
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Storage Controller
>Affects Versions: 4.8.0
> Environment: Xen Server
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Issue Description
> ===
> 1. Create an instance on the local storage on any host
> 2. Create a scheduled snapshot of the volume:
> 3. Wait until ACS created the snapshot. ACS is creating a snapshot on local 
> storage and is transferring this snapshot to secondary storage. But the 
> latest snapshot on local storage will stay there. This is as expected.
> 4. Migrate the instance to another XenServer host with ACS UI and Storage 
> Live Migration
> 5. The Snapshot on the old host on local storage will not be cleaned up and 
> is staying on local storage. So local storage will fill up with unneeded 
> snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9572) Snapshot on primary storage not cleaned up after Storage migration

2016-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663505#comment-15663505
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9572:


Github user yvsubhash closed the pull request at:

https://github.com/apache/cloudstack/pull/1740


> Snapshot on primary storage not cleaned up after Storage migration
> --
>
> Key: CLOUDSTACK-9572
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9572
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Storage Controller
>Affects Versions: 4.8.0
> Environment: Xen Server
>Reporter: subhash yedugundla
> Fix For: 4.8.1
>
>
> Issue Description
> ===
> 1. Create an instance on the local storage on any host
> 2. Create a scheduled snapshot of the volume:
> 3. Wait until ACS created the snapshot. ACS is creating a snapshot on local 
> storage and is transferring this snapshot to secondary storage. But the 
> latest snapshot on local storage will stay there. This is as expected.
> 4. Migrate the instance to another XenServer host with ACS UI and Storage 
> Live Migration
> 5. The Snapshot on the old host on local storage will not be cleaned up and 
> is staying on local storage. So local storage will fill up with unneeded 
> snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9590) KVM + CentOS 7.2 + Agent in Alert State for long time

2016-11-14 Thread Wei Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663308#comment-15663308
 ] 

Wei Zhou commented on CLOUDSTACK-9590:
--

is the host in Maintenance ?

> KVM + CentOS 7.2 + Agent in Alert State for long time
> -
>
> Key: CLOUDSTACK-9590
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9590
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: cloudstack-agent
>Affects Versions: 4.9.0
> Environment: entOS Linux release 7.2.1511 (Core)
> cloudstack-agent-4.9.0-1.el7.centos.x86_64
>Reporter: Sven Vogel
> Attachments: agent.log, cloudstack-startup.log, management-server.zip
>
>
> Hi,
> When i add a new host to cloudstack management server it take some time to 
> get host out from alert state.
> 1. i add the host and host add not possible
> 2. values are correct set to agent.properties, restart cloustack agent
> 3. agent says connected to server
> 4. management server says "alert"
> management-server.log
> 2016-11-10 13:23:06,783 DEBUG [c.c.h.Status] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Transition:[Resource 
> state = Enabled, Agent event = AgentDisconnected, Host
> id = 51, name = kvm02.oscloud.local]
> 2016-11-10 13:23:06,798 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Notifying other nodes 
> of to disconnect
> 2016-11-10 13:23:06,806 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentConnectTaskPool-49:ctx-c3b72839) (logid:5a86e1fb) Failed to handle host 
> connection: com.cloud.exception.Connection
> Exception: Unable to get an answer to the CheckNetworkCommand from agent: 51
> is there any way to speed up the alert state? is it normal that it take so 
> long?
> thanks
> Sven



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-9370) Failed to create VPC: Unable to start VPC VR (VM DomainRouter) due to error in finalizeStart, not retrying

2016-11-14 Thread yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663303#comment-15663303
 ] 

yang commented on CLOUDSTACK-9370:
--

Can yout tell us how to fix this bug? or we need to wait for the new release?

> Failed to create VPC: Unable to start  VPC VR (VM DomainRouter) due to error 
> in finalizeStart, not retrying
> ---
>
> Key: CLOUDSTACK-9370
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9370
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Virtual Router
>Affects Versions: 4.9.0
> Environment: Centos El7
> KVM
> OpenvSwitch (VLAN 0)
> NFS (primary/secondary) 
>Reporter: Mani Prashanth Varma Manthena
>Priority: Critical
> Fix For: 4.9.1.0
>
>
> I am unable to create VPCs on latest cloudstack master due to the following 
> error:
> {noformat:title=Root Cause Error in Agent log}
> 2016-04-27 02:31:03,134 DEBUG [kvm.resource.LibvirtComputingResource] 
> (agentRequest-Handler-2:null) (logid:6b2d4faa) [INFO] update_config.py :: 
> Processing incoming file => ip_associations.json[INFO] Processing JSON file 
> ip_associations.jsonTraceback (most recent call last):  File 
> "/opt/cloud/bin/update_config.py", line 140, in process_file()  
> File "/opt/cloud/bin/update_config.py", line 52, in process_file
> qf.load(None)  File "/opt/cloud/bin/merge.py", line 258, in loadproc = 
> updateDataBag(self)  File "/opt/cloud/bin/merge.py", line 91, in __init__
> self.process()  File "/opt/cloud/bin/merge.py", line 103, in processdbag 
> = self.processIP(self.db.getDataBag())  File "/opt/cloud/bin/merge.py", line 
> 190, in processIPdbag = cs_ip.merge(dbag, ip)  File 
> "/opt/cloud/bin/cs_ip.py", line 32, in mergeip['device'] = 'eth' + 
> str(ip['nic_dev_id'])KeyError: 'nic_dev_id'
> 2016-04-27 02:31:03,135 DEBUG 
> [resource.virtualnetwork.VirtualRoutingResource] 
> (agentRequest-Handler-2:null) (logid:6b2d4faa) Processing ScriptConfigItem, 
> executing update_config.py ip_associations.json took 911ms
> {noformat}
> {noformat:title=Root Cause Error in Management Server log}
> 2016-04-27 02:30:19,975 DEBUG [c.c.a.m.ClusteredAgentAttache] 
> (Work-Job-Executor-10:ctx-1279b068 job-1159/job-1160 ctx-c31efe73) 
> (logid:6b2d4faa) Seq 9-332421947495286159: Forwarding Seq 
> 9-332421947495286159:  { Cmd , MgmtId: 275619427298304, via: 
> 9(ovs-2.mvdcvtb16.us.alcatel-lucent.com), Ver: v1, Flags: 100111, 
> [{"com.cloud.agent.api.StartCommand":{"vm":{"id":252,"name":"r-252-VM","type":"DomainRouter","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":268435456,"maxRam":268435456,"arch":"x86_64","os":"Debian
>  GNU/Linux 5.0 (64-bit)","platformEmulator":"Debian GNU/Linux 5","bootArgs":" 
> vpccidr=10.1.1.1/16 domain=cs2cloud.internal dns1=128.251.10.29 template=domP 
> name=r-252-VM eth0ip=169.254.1.123 eth0mask=255.255.0.0 type=vpcrouter 
> disable_rp_filter=true 
> baremetalnotificationsecuritykey=0oLpL4swbL6Yu_xsuRdyjwmmyPHAU1V-iMpmMNKO00vNIP5bxronvhQZ_qehiEZ99Eo9avCHg9uLh1cbiz7pQA
>  
> baremetalnotificationapikey=wEax_CyEaKZHn8ZkPBQLQaibjSWZ0OYJuEQA3l2RUA41GXZxaie9P6oQPeNlzjIGl-fDpKWp9MkAEQOJYvE4vA
>  host=10.31.59.151 
>