Re: Kubernetes Dashboard Access

2022-02-09 Thread Edward St Pierre
Hi Jim,

Typically you would run the proxy from your local machine with a web
browser.

And then open the URL specified on your browser.

Ed

On Wed, 9 Feb 2022 at 09:05, James Steele  wrote:

> Hi all,  we have set up the Kubernetes service inside CloudStack as per:
>
>
> http://docs.cloudstack.apache.org/en/latest/plugins/cloudstack-kubernetes-service.html#enabling-the-kubernetes-service
>
>
> It all looks fine and set up correctly (things are Green), but how do we
> access the Kubernetes Dashboard?
>
>
>
>
> The Kubernetes 'Access' tab for the Kubernetes Cluster says:
>
> *'Kubernetes Dashboard UI'*
>
> *Run proxy locally:*
>
> *kubectl --kubeconfig /custom/path/kube.conf proxy*
>
>
> *Open URL in browser:*
>
> *
> http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
> <
> http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
> >*
>
>
> *Token for dashboard login can be retrieved using following command:*
>
> *kubectl --kubeconfig /custom/path/kube.conf describe secret $(kubectl
> --kubeconfig /custom/path/kube.conf get secrets -n kubernetes-dashboard |
> grep kubernetes-dashboard-token | awk '{print $1}') -n
> kubernetes-dashboard*
>
>
>
>
> I have downloaded the config and access token, however CS is setup on
> Ubuntu Server 20.04, with no GUI or web browser.
>
>
> Any ideas how to remotely load the web Kubernetes Dashboard from another
> machine?
>
>
> Thanks, Jim
>


Local Storage Question

2022-02-09 Thread Edward St Pierre
Hi,

I have an existing KVM cluster and am looking to enable local storage for
certain workloads.

Is it safe to enable this on an existing production cluster and am I
correct in assuming that
/var/lib/libvirt/images/ will be the path unless defined within
agent.properties?

Currently my agent.properties only has 'local.storage.uuid' defined and not
'local.storage.path'

Thanks in advance.
Ed


Re: data-server. resolution

2021-11-04 Thread Edward St Pierre
Hi Wei,

cloud-init can resolve 'data-server' with and without the '.'   but on a
redundant setup using the 4.15.1 template it resolves to the guest IP
address and not the VIP on the VR and therefore cannot access the password
server.

on a non-redundant VR setup it works as expected.

Ed

On Thu, 4 Nov 2021 at 09:28, Wei ZHOU  wrote:

> Hi Edward,
>
> Sorry I am a bit confused.
>
> Is cloud-init not working in your vm, because 'data-server.' is not
> resolved ? But why did you check the issue in the VR not the VM ?
> Can 'data-server' (without dot) be resolved in the vm ?
>
> I confirm that the password server listens only on the VIP,  it should not
> be a problem.
> in my testing with 4.16.0.0-rc2, 'data-server' and 'data-server.' can be
> both resolved as VIP.
>
> -Wei
>
> On Thu, 4 Nov 2021 at 09:54, Edward St Pierre 
> wrote:
>
> > Hi,
> >
> > The diagnostics I provided shows that it only appears to be listening on
> > the VIP and not the guest IP..
> > The DNS does resolve (DNS resolution provided earlier), however I believe
> > it should resolve to the VIP address as the password server is only
> > listening on the VIP.
> >
> > I am using the template 'systemvm-kvm-4.15.1'
> > The bug you have highlighted is about ubuntu.
> >
> > This is the password server command line:
> >
> > python /opt/cloud/bin/passwd_server_ip.py 10.1.1.1,10.1.1.154
> >
> > And, you can see it only takes the first IP address to listen on:
> >
> > if len(sys.argv) > 1:
> > addresses = sys.argv[1].split(",")
> > if len(addresses) > 0:
> > listeningAddress = addresses[0]
> > allowAddresses.append(addresses[0])
> > if len(addresses) > 1:
> > allowAddresses.append(addresses[1])
> >
> > server_address = (listeningAddress, 8080)
> > passwordServer = ServerClass(server_address, HandlerClass)
> >
> > I do not think that listening on the VIP is the problem,  I believe
> > that 'data-server.' should resolve to the VIP address and not the guest
> IP
> > address.
> >
> > Ed
> >
> > On Wed, 3 Nov 2021 at 22:24, Wei ZHOU  wrote:
> >
> > > It is not a problem, in my opinion. The password server and userdata
> > server
> > > listen on both guest ip and vip.
> > >
> > > As I commented on the link in previous reply, if cloud-init does not
> work
> > > in your vm template, it might be caused by systemd-resolved.
> > >
> > > -Wei
> > >
> > >
> > > On Wednesday, 3 November 2021, Edward St Pierre <
> > edward.stpie...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for your input, it actually looks like a bug with the
> redundant
> > VR
> > > > setup.
> > > >
> > > > See diagnostics directly on master VR:
> > > >
> > > > root@r-418-VM:~# netstat -anpl | grep 8080
> > > > tcp0  0 10.1.1.1:8080   0.0.0.0:*
> > >  LISTEN
> > > >  1610/python
> > > >
> > > > root@r-418-VM:~# ip addr show
> > > > 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
> > group
> > > > default qlen 1000
> > > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > > inet 127.0.0.1/8 scope host lo
> > > >valid_lft forever preferred_lft forever
> > > > 2: eth0:  mtu 1500 qdisc pfifo_fast
> > > state
> > > > UP group default qlen 1000
> > > > link/ether 02:00:76:04:00:02 brd ff:ff:ff:ff:ff:ff
> > > > inet 10.1.1.154/24 brd 10.1.1.255 scope global eth0
> > > >valid_lft forever preferred_lft forever
> > > > inet 10.1.1.1/24 brd 10.1.1.255 scope global secondary eth0
> > > >valid_lft forever preferred_lft forever
> > > >
> > > > root@r-418-VM:~# dig data-server. @localhost
> > > >
> > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> data-server. @localhost
> > > > ;; global options: +cmd
> > > > ;; Got answer:
> > > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32161
> > > > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0,
> ADDITIONAL: 1
> > > >
> > > > ;; OPT PSEUDOSECTION:
> > > > ; EDNS: version: 0, flags:; udp

Re: data-server. resolution

2021-11-04 Thread Edward St Pierre
Hi,

The diagnostics I provided shows that it only appears to be listening on
the VIP and not the guest IP..
The DNS does resolve (DNS resolution provided earlier), however I believe
it should resolve to the VIP address as the password server is only
listening on the VIP.

I am using the template 'systemvm-kvm-4.15.1'
The bug you have highlighted is about ubuntu.

This is the password server command line:

python /opt/cloud/bin/passwd_server_ip.py 10.1.1.1,10.1.1.154

And, you can see it only takes the first IP address to listen on:

if len(sys.argv) > 1:
addresses = sys.argv[1].split(",")
if len(addresses) > 0:
listeningAddress = addresses[0]
allowAddresses.append(addresses[0])
if len(addresses) > 1:
allowAddresses.append(addresses[1])

server_address = (listeningAddress, 8080)
passwordServer = ServerClass(server_address, HandlerClass)

I do not think that listening on the VIP is the problem,  I believe
that 'data-server.' should resolve to the VIP address and not the guest IP
address.

Ed

On Wed, 3 Nov 2021 at 22:24, Wei ZHOU  wrote:

> It is not a problem, in my opinion. The password server and userdata server
> listen on both guest ip and vip.
>
> As I commented on the link in previous reply, if cloud-init does not work
> in your vm template, it might be caused by systemd-resolved.
>
> -Wei
>
>
> On Wednesday, 3 November 2021, Edward St Pierre  >
> wrote:
>
> > Hi,
> >
> > Thanks for your input, it actually looks like a bug with the redundant VR
> > setup.
> >
> > See diagnostics directly on master VR:
> >
> > root@r-418-VM:~# netstat -anpl | grep 8080
> > tcp0  0 10.1.1.1:8080   0.0.0.0:*
>  LISTEN
> >  1610/python
> >
> > root@r-418-VM:~# ip addr show
> > 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> > default qlen 1000
> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > inet 127.0.0.1/8 scope host lo
> >valid_lft forever preferred_lft forever
> > 2: eth0:  mtu 1500 qdisc pfifo_fast
> state
> > UP group default qlen 1000
> > link/ether 02:00:76:04:00:02 brd ff:ff:ff:ff:ff:ff
> > inet 10.1.1.154/24 brd 10.1.1.255 scope global eth0
> >valid_lft forever preferred_lft forever
> > inet 10.1.1.1/24 brd 10.1.1.255 scope global secondary eth0
> >valid_lft forever preferred_lft forever
> >
> > root@r-418-VM:~# dig data-server. @localhost
> >
> > ; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> data-server. @localhost
> > ;; global options: +cmd
> > ;; Got answer:
> > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32161
> > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
> >
> > ;; OPT PSEUDOSECTION:
> > ; EDNS: version: 0, flags:; udp: 4096
> > ;; QUESTION SECTION:
> > ;data-server.   IN  A
> >
> > ;; ANSWER SECTION:
> > data-server.0   IN  A   10.1.1.154
> >
> > ;; Query time: 1 msec
> > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > ;; WHEN: Wed Nov 03 11:58:43 UTC 2021
> > ;; MSG SIZE  rcvd: 56
> >
> >
> > Ed
> >
> > On Wed, 3 Nov 2021 at 20:15, Wei ZHOU  wrote:
> >
> > > Hi Edward,
> > >
> > > You may face an issue which has recently been fixed in cloud-init .
> > > Please refer to https://github.com/canonical/cloud-init/pull/1004
> > >
> > > -Wei
> > >
> > > On Wed, 3 Nov 2021 at 12:48, Edward St Pierre <
> edward.stpie...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Guys,
> > > >
> > > > Just a really quick question.
> > > >
> > > > Should 'data-server.' resolve to the virtual router or the guest?
> > > >
> > > > Basically the cloud-init datasource for Cloudstack that comes with
> > CentOS
> > > > Stream seems to use this as the address for the VR.
> > > >
> > > > Just looking to see if this is a VR bug or a bug with this module on
> > > > CentOS.
> > > >
> > > > Regards
> > > >
> > > > Ed
> > > >
> > >
> >
>


Re: data-server. resolution

2021-11-03 Thread Edward St Pierre
Hi,

Thanks for your input, it actually looks like a bug with the redundant VR
setup.

See diagnostics directly on master VR:

root@r-418-VM:~# netstat -anpl | grep 8080
tcp0  0 10.1.1.1:8080   0.0.0.0:*   LISTEN
 1610/python

root@r-418-VM:~# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 02:00:76:04:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.154/24 brd 10.1.1.255 scope global eth0
   valid_lft forever preferred_lft forever
inet 10.1.1.1/24 brd 10.1.1.255 scope global secondary eth0
   valid_lft forever preferred_lft forever

root@r-418-VM:~# dig data-server. @localhost

; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> data-server. @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32161
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;data-server.   IN  A

;; ANSWER SECTION:
data-server.0   IN  A   10.1.1.154

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 03 11:58:43 UTC 2021
;; MSG SIZE  rcvd: 56


Ed

On Wed, 3 Nov 2021 at 20:15, Wei ZHOU  wrote:

> Hi Edward,
>
> You may face an issue which has recently been fixed in cloud-init .
> Please refer to https://github.com/canonical/cloud-init/pull/1004
>
> -Wei
>
> On Wed, 3 Nov 2021 at 12:48, Edward St Pierre 
> wrote:
>
> > Hi Guys,
> >
> > Just a really quick question.
> >
> > Should 'data-server.' resolve to the virtual router or the guest?
> >
> > Basically the cloud-init datasource for Cloudstack that comes with CentOS
> > Stream seems to use this as the address for the VR.
> >
> > Just looking to see if this is a VR bug or a bug with this module on
> > CentOS.
> >
> > Regards
> >
> > Ed
> >
>


Re: data-server. resolution

2021-11-03 Thread Edward St Pierre
Just need to add,

I mean it is resolving to the guest IP of the virtual router, and not the
address that it is listening on:

root@r-418-VM:~# netstat -anpl | grep 8080
tcp0  0 10.1.1.1:8080   0.0.0.0:*   LISTEN
 1610/python

root@r-418-VM:~# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state
UP group default qlen 1000
link/ether 02:00:76:04:00:02 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.154/24 brd 10.1.1.255 scope global eth0
   valid_lft forever preferred_lft forever
inet 10.1.1.1/24 brd 10.1.1.255 scope global secondary eth0
   valid_lft forever preferred_lft forever

root@r-418-VM:~# dig data-server. @localhost

; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> data-server. @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32161
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;data-server.   IN  A

;; ANSWER SECTION:
data-server.0   IN  A   10.1.1.154

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 03 11:58:43 UTC 2021
;; MSG SIZE  rcvd: 56


On Wed, 3 Nov 2021 at 11:48, Edward St Pierre 
wrote:

> Hi Guys,
>
> Just a really quick question.
>
> Should 'data-server.' resolve to the virtual router or the guest?
>
> Basically the cloud-init datasource for Cloudstack that comes with CentOS
> Stream seems to use this as the address for the VR.
>
> Just looking to see if this is a VR bug or a bug with this module on
> CentOS.
>
> Regards
>
> Ed
>


data-server. resolution

2021-11-03 Thread Edward St Pierre
Hi Guys,

Just a really quick question.

Should 'data-server.' resolve to the virtual router or the guest?

Basically the cloud-init datasource for Cloudstack that comes with CentOS
Stream seems to use this as the address for the VR.

Just looking to see if this is a VR bug or a bug with this module on CentOS.

Regards

Ed


Re: Cloudstack GPU

2021-07-26 Thread Edward St Pierre
Hi,

I believe we have got the RTX 8000 arriving, and I was planning on testing
with the Nvidia vGPU manager.

Ed

On Mon, 26 Jul 2021 at 15:41, Alex Mattioli 
wrote:

> Hi Edward,
> Which model of GPUs are you using? Going for passthrough?
> Thanks
> Alex
>
>
>
>
> -Original Message-----
> From: Edward St Pierre 
> Sent: 26 July 2021 16:29
> To: users@cloudstack.apache.org
> Subject: Re: Cloudstack GPU
>
> Hi,
>
> I am looking into implementing vGPU with KVM after reading this article:
>
> https://lab.piszki.pl/cloudstack-kvm-and-running-vm-with-vgpu/
>
> And checking out specific notes here:
>
> https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html
>
> Ed
>
>
> On Mon, 26 Jul 2021 at 14:44, Rohit Yadav 
> wrote:
>
> > Hi Alex,
> >
> > I've heard/seen some users using GPUs with XenServer for graphical
> > rendering and I remember somebody discussing about GPU in KVM which is
> > possible by using the extraconfig feature while deploying VM (the only
> > limitation is on KVM you cannot share one GPU across VMs; however if
> > your server has multiple GPUs you can assign them to one or more VMs).
> >
> > I found this old wiki:
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+su
> > pport+for+CloudStack+Guest+VMs (GPU models are enterprise Nvidia
> > based)
> >
> >
> > Regards.
> >
> > 
> > From: Alex Mattioli 
> > Sent: Thursday, July 22, 2021 19:07
> > To: users@cloudstack.apache.org ;
> > d...@cloudstack.apache.org 
> > Subject: Cloudstack GPU
> >
> > Hi all,
> > Anyone out there using GPUs with Cloudstack?
> > If so, with which hypervisor and GPU?
> >
> > Thanks,
> > Alex
> >
> >
> >
> >
> >
> >
> >
>


Re: Cloudstack GPU

2021-07-26 Thread Edward St Pierre
Hi,

I am looking into implementing vGPU with KVM after reading this article:

https://lab.piszki.pl/cloudstack-kvm-and-running-vm-with-vgpu/

And checking out specific notes here:
https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html

Ed


On Mon, 26 Jul 2021 at 14:44, Rohit Yadav  wrote:

> Hi Alex,
>
> I've heard/seen some users using GPUs with XenServer for graphical
> rendering and I remember somebody discussing about GPU in KVM which is
> possible by using the extraconfig feature while deploying VM (the only
> limitation is on KVM you cannot share one GPU across VMs; however if your
> server has multiple GPUs you can assign them to one or more VMs).
>
> I found this old wiki:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs
> (GPU models are enterprise Nvidia based)
>
>
> Regards.
>
> 
> From: Alex Mattioli 
> Sent: Thursday, July 22, 2021 19:07
> To: users@cloudstack.apache.org ;
> d...@cloudstack.apache.org 
> Subject: Cloudstack GPU
>
> Hi all,
> Anyone out there using GPUs with Cloudstack?
> If so, with which hypervisor and GPU?
>
> Thanks,
> Alex
>
>
>
>
>
>
>


Re: Cannot enable Static Nat.

2021-07-23 Thread Edward St Pierre
Hi,

You need to have a spare IP address for static NAT, and when you click on
the IP address there is a little icon on the top right to enable static
nat. (next to the trash can)

Ed


On Fri, 23 Jul 2021 at 08:03, Halloesnith  wrote:

> Hello everyone,
>
> I am using cloudstack 4.15.1 with KVM. I was looking into the feature of
> static NAT and I was unable to do so. I have disabled all the loadbalancing
> and port forwarding rules for that IP address but the option to enabled
> static nat does not appear.
>
> I will be grateful for any kind of suggestion.
>
> Thank You.
>


Re: Failure to start Virtual Router after upgrade to 4.15.1

2021-07-15 Thread Edward St Pierre
Hi Slavka,

You are a lifesaver, I had two versions of the api in the directory:
cloud-api-4.13.0.0.jar
cloud-api-4.15.1.0.jar

Removed the old version from the previous update and all is good.  Thanks
for your help.

Ed


On Thu, 15 Jul 2021 at 20:55, Slavka Peleva 
wrote:

> Hi Eduard,
>
> Did you upgrade and cloudstack agents? I guess that `cloud-api` jar in
> `/usr/share/cloudstack-agent/lib` is with an older version that does not
> have the method `isPrivateGateway`
>
> Best regards,
> Slavka
>
> On Thu, Jul 15, 2021 at 10:02 PM Edward St Pierre <
> edward.stpie...@gmail.com>
> wrote:
>
> > Hello,
> >
> > Any help or pointers would be greatly appreciated.
> >
> > After upgrading CS  Virtual routers to 4.15.1 The VM does actually get
> > built and powers on (visible via console proxy), and then is suddenly
> > powered off. and this error is displayed in the interface
> >
> > '(r-340-VM) Resource [Host:10] is unreachable: Host 10: Unable to start
> > instance due to Unable to start VM:f308d9b5-632c-47d2-9b24-01f02bb257a5
> due
> > to error in finalizeStart, not retrying'
> >
> >
> > Looking through the logs I cannot find anything other than the below snip
> > that seem interesting:
> >
> > :"java.lang.NoSuchMethodError: 'boolean
> > com.cloud.agent.api.to.IpAddressTO.isPrivateGateway()'
> >
> > Here are the full logs:
> >
> > 2021-07-15 19:45:43,250 DEBUG [c.c.a.t.Request]
> > (AgentManager-Handler-5:null) (logid:) Seq 10-6995779071166649120:
> > Processing:  { Ans: , MgmtId: 345050527765, via: 10, Ver: v1, Flags:
> > 1110,
> >
> >
> [{"com.cloud.agent.api.StartAnswer":{"vm":{"id":"340","name":"r-340-VM","state":"Starting","type":"DomainRouter","cpus":"1","minSpeed":"500","maxSpeed":"500","minRam":"(256.00
> > MB) 268435456","maxRam":"(256.00 MB)
> > 268435456","arch":"x86_64","os":"Debian GNU/Linux 9
> > (64-bit)","platformEmulator":"Debian GNU/Linux 9 (64-bit)","bootArgs":"
> > vpccidr=10.0.0.0/16 domain=cs9cloud.internal dns1=81.19.54.209 dns2=
> > template=domP name=r-340-VM eth0ip=169.254.146.220
> >
> >
> eth0mask=255.2","enableHA":"true","limitCpuUse":"false","enableDynamicallyScaleVm":"false","vncPassword":"M_cgfURA4I7i0BNoy2pmYw","vncAddr":"10.100.6.201","params":{},"uuid":"f308d9b5-632c-47d2-9b24-01f02bb257a5","enterHardwareSetup":"false","disks":[{"data":{"org.apache.
> > cloudstack.storage.to
> >
> .VolumeObjectTO":{"uuid":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","volumeType":"ROOT","dataStore":{"org.
> > apache.cloudstack.storage.to
> >
> .PrimaryDataStoreTO":{"uuid":"658c4937-b7bc-3aa7-a0d3-aa144224dd52","id":"7","poolType":"RBD","host":"ceph-mon.cloudstack","path":"cloudstack2019","port":"6789","url":"RBD://ceph-mon.cloudstack/cloudstack2019/?ROLE=Primary&STOREUUID=658c4937-b7bc-3aa7-a0d3-aa144224dd52","isManaged":"false"}},"name":"ROOT-340","size":"(2.44
> > GB)
> >
> >
> 262144","path":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","volumeId":"468","vmName":"r-340-VM","accountId":"9","format":"RAW","provisioningType":"THIN","id":"468","deviceId":"0","bytesReadRate":"(0
> > bytes) 0","bytesWriteRate":"(0 bytes) 0","iopsReadRate":"(0 bytes)
> > 0","iopsWriteRate":"(0 bytes)
> >
> >
> 0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"}},"diskSeq":"0","path":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","type":"ROOT","_details":{"storageHost":"ceph-mon.cloudstack","managed":"false","storagePort":"6789","volumeSize":"(2.44
> > GB)
> >
&g

Failure to start Virtual Router after upgrade to 4.15.1

2021-07-15 Thread Edward St Pierre
Hello,

Any help or pointers would be greatly appreciated.

After upgrading CS  Virtual routers to 4.15.1 The VM does actually get
built and powers on (visible via console proxy), and then is suddenly
powered off. and this error is displayed in the interface

'(r-340-VM) Resource [Host:10] is unreachable: Host 10: Unable to start
instance due to Unable to start VM:f308d9b5-632c-47d2-9b24-01f02bb257a5 due
to error in finalizeStart, not retrying'


Looking through the logs I cannot find anything other than the below snip
that seem interesting:

:"java.lang.NoSuchMethodError: 'boolean
com.cloud.agent.api.to.IpAddressTO.isPrivateGateway()'

Here are the full logs:

2021-07-15 19:45:43,250 DEBUG [c.c.a.t.Request]
(AgentManager-Handler-5:null) (logid:) Seq 10-6995779071166649120:
Processing:  { Ans: , MgmtId: 345050527765, via: 10, Ver: v1, Flags:
1110,
[{"com.cloud.agent.api.StartAnswer":{"vm":{"id":"340","name":"r-340-VM","state":"Starting","type":"DomainRouter","cpus":"1","minSpeed":"500","maxSpeed":"500","minRam":"(256.00
MB) 268435456","maxRam":"(256.00 MB)
268435456","arch":"x86_64","os":"Debian GNU/Linux 9
(64-bit)","platformEmulator":"Debian GNU/Linux 9 (64-bit)","bootArgs":"
vpccidr=10.0.0.0/16 domain=cs9cloud.internal dns1=81.19.54.209 dns2=
template=domP name=r-340-VM eth0ip=169.254.146.220
eth0mask=255.2","enableHA":"true","limitCpuUse":"false","enableDynamicallyScaleVm":"false","vncPassword":"M_cgfURA4I7i0BNoy2pmYw","vncAddr":"10.100.6.201","params":{},"uuid":"f308d9b5-632c-47d2-9b24-01f02bb257a5","enterHardwareSetup":"false","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"658c4937-b7bc-3aa7-a0d3-aa144224dd52","id":"7","poolType":"RBD","host":"ceph-mon.cloudstack","path":"cloudstack2019","port":"6789","url":"RBD://ceph-mon.cloudstack/cloudstack2019/?ROLE=Primary&STOREUUID=658c4937-b7bc-3aa7-a0d3-aa144224dd52","isManaged":"false"}},"name":"ROOT-340","size":"(2.44
GB)
262144","path":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","volumeId":"468","vmName":"r-340-VM","accountId":"9","format":"RAW","provisioningType":"THIN","id":"468","deviceId":"0","bytesReadRate":"(0
bytes) 0","bytesWriteRate":"(0 bytes) 0","iopsReadRate":"(0 bytes)
0","iopsWriteRate":"(0 bytes)
0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"}},"diskSeq":"0","path":"8a7e0ad0-69b1-46ed-a75c-e984b836a989","type":"ROOT","_details":{"storageHost":"ceph-mon.cloudstack","managed":"false","storagePort":"6789","volumeSize":"(2.44
GB)
262144"}}],"nics":[{"deviceId":"0","networkRateMbps":"-1","defaultNic":"false","pxeDisable":"true","nicUuid":"efeacfd4-95d1-4cdb-897d-a2f79bc25782","details":{"PromiscuousMode":"false","MacAddressChanges":"true","ForgedTransmits":"true"},"dpdkEnabled":"false","uuid":"bae5a165-83f2-4134-8ae0-6dc7a149a32a","ip":"169.254.146.220","netmask":"255.255.0.0","gateway":"169.254.0.1","mac":"0e:00:a9:fe:92:dc","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":"false"}],"guestOsDetails":{},"extraConfig":{}},"result":"true","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.check.CheckSshAnswer":{"result":"true","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.GetDomRVersionAnswer":{"templateVersion":"Cloudstack
Release 4.15.1 Wed 10 Mar 2021 05:38:45 AM
UTC","scriptsVersion":"ac3f3efc5ffe5dbab9616395c32e0d3d
","result":"true","details":"Cloudstack Release 4.15.1 Wed 10 Mar 2021
05:38:45 AM UTC&ac3f3efc5ffe5dbab9616395c32e0d3d
","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.PlugNicAnswer":{"result":"true","details":"success","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.Answer":{"result":"false","details"

at
com.cloud.agent.resource.virtualnetwork.facade.IpAssociationConfigItem.generateConfig(IpAssociationConfigItem.java:45)
at
com.cloud.agent.resource.virtualnetwork.VirtualRoutingResource.generateCommandCfg(VirtualRoutingResource.java:489)
at
com.cloud.agent.resource.virtualnetwork.VirtualRoutingResource.executeRequest(VirtualRoutingResource.java:142)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkElementCommandWrapper.execute(LibvirtNetworkElementCommandWrapper.java:35)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkElementCommandWrapper.execute(LibvirtNetworkElementCommandWrapper.java:29)
at
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1476)
at com.cloud.agent.Agent.processRequest(Agent.java:661)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1079)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at
ja

Re: New System Template for virtual routers

2021-07-15 Thread Edward St Pierre
Hi Wei,

Yes same issue and removed the lock before asking the question. managed to
update one of my VRs but unfortunately the others seem to be booting then
shutting down.

Will look into it a bit more later as I may have other problems too since
updating from 4.15.0 to 4.15.1. Which might be unrelated.

Ed

On Thu, 15 Jul 2021 at 16:07, Wei ZHOU  wrote:

> Hi Edward,
>
> Is that same as what I reported at
> https://github.com/apache/cloudstack/issues/5138 ?
> removing /var/lock/conntrackd.lock solves the problem.
>
> -Wei
>
>
>
> On Thu, 15 Jul 2021 at 16:29, Edward St Pierre 
> wrote:
>
> > Hi Guys,
> >
> > Is there a problem with the new system template? it appears that there
> is a
> > stale lock file for conntrackd, this is from a newly deployed virtual
> > router that was not coming up properly:
> >
> > systemvm-kvm-4.15.1
> > Checksum:
> >
> >
> 5cd235522d1d9cc9c5c80824fdca80004956b5aee533e283334d8a6b6caa2aa3fd939650765689a9437e452e7e6ae5104e068bc84553b4ef8766236a43fa26b3
> >
> > [Thu Jul 15 14:18:30 2021] (pid=2721) [notice] resync with master
> conntrack
> > table
> > [Thu Jul 15 14:18:30 2021] (pid=2721) [notice] sending bulk update
> > [Thu Jul 15 14:18:32 2021] (pid=4647) [ERROR] lockfile
> > `/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
> > [Thu Jul 15 14:18:32 2021] (pid=2721) [notice] flushing kernel conntrack
> > table
> > [Thu Jul 15 14:18:34 2021] (pid=5266) [ERROR] lockfile
> > `/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
> > [Thu Jul 15 14:18:34 2021] (pid=2721) [notice] flushing kernel conntrack
> > table
> > [Thu Jul 15 14:18:38 2021] (pid=6832) [ERROR] lockfile
> > `/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
> > [Thu Jul 15 14:18:38 2021] (pid=2721) [notice] flushing kernel conntrack
> > table
> > [Thu Jul 15 14:18:42 2021] (pid=7917) [ERROR] lockfile
> > `/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
> > [Thu Jul 15 14:18:42 2021] (pid=2721) [notice] flushing kernel conntrack
> > table
> >
> >
> >- Regards
> >
>


New System Template for virtual routers

2021-07-15 Thread Edward St Pierre
Hi Guys,

Is there a problem with the new system template? it appears that there is a
stale lock file for conntrackd, this is from a newly deployed virtual
router that was not coming up properly:

systemvm-kvm-4.15.1
Checksum:
5cd235522d1d9cc9c5c80824fdca80004956b5aee533e283334d8a6b6caa2aa3fd939650765689a9437e452e7e6ae5104e068bc84553b4ef8766236a43fa26b3

[Thu Jul 15 14:18:30 2021] (pid=2721) [notice] resync with master conntrack
table
[Thu Jul 15 14:18:30 2021] (pid=2721) [notice] sending bulk update
[Thu Jul 15 14:18:32 2021] (pid=4647) [ERROR] lockfile
`/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
[Thu Jul 15 14:18:32 2021] (pid=2721) [notice] flushing kernel conntrack
table
[Thu Jul 15 14:18:34 2021] (pid=5266) [ERROR] lockfile
`/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
[Thu Jul 15 14:18:34 2021] (pid=2721) [notice] flushing kernel conntrack
table
[Thu Jul 15 14:18:38 2021] (pid=6832) [ERROR] lockfile
`/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
[Thu Jul 15 14:18:38 2021] (pid=2721) [notice] flushing kernel conntrack
table
[Thu Jul 15 14:18:42 2021] (pid=7917) [ERROR] lockfile
`/var/lock/conntrackd.lock' exists, perhaps conntrackd already running?
[Thu Jul 15 14:18:42 2021] (pid=2721) [notice] flushing kernel conntrack
table


   - Regards


Re: Secondary storage doesn't work

2021-07-14 Thread Edward St Pierre
Hi Andy,

Have you prepared the system VM template?

Ed

On Wed, 14 Jul 2021 at 11:32, Andy Nguyen  wrote:

> Short version: identical nfs export for primary and secondary. Primary
> works fine, secondary mounts but doesn't write.
>
> Long version:
> I asked this on reddit
> https://www.reddit.com/r/cloudstack/comments/o8rbsb/secondary_storage_help_please/
> but didn't get much light on the matter.
> Hypervisor host: Alma Linux + QEMU KVM + Cloudstack Agent + nfsd
> Start a VM, also run Alma Linux + Cloudstack Management.
>
> Mount both primary and secondary via the web interface, that completed
> without error.
> Went and add ISO and I got "Request Failed (530) There is no secondary
> storage VM for downloading template to image store Secondary"
>
> I can manually mount the NFS shares and read/write to it from inside the
> VM without problem, so it's clearly not NFS nor permission problem.
>
> Any help on where I may start with troubleshooting? What log should I be
> looking at?
>


Re: Snapshots are not working after upgrading to 4.15.0

2021-06-16 Thread Edward St Pierre
Hi Guys,

I have already logged this as a big under reference: 4797

Ed


On Thu, 17 Jun 2021 at 06:37, Suresh Anaparti 
wrote:

> Hi Andrei,
>
> Can you check if the storage garbage collector is enabled or not in your
> env (specified using the global setting 'storage.cleanup.enabled'). If it
> is enabled, check the interval & delay setting: 'storage.cleanup.interval'
> and 'storage.cleanup.delay', and see the logs to confirm cleanup is
> performed or not.
>
> Also, check the snapshot status / state in snapshots & snapshot_store_ref
> tables for the snapshots that are not deleted during the cleanup. Is
> 'removed' timestamp set for them in snapshots table?
>
> Regards,
> Suresh
>
> On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" 
> wrote:
>
> Hello,
>
> I've done some more investigation and indeed, the snapshots were not
> taken because the secondary storage was over 90% used. I have started
> cleaning some of the older volumes and noticed another problem. After
> removing snapshots, they do not seem to be removed from the secondary
> storage. I've removed all snapshots over 24 hours ago and it looks like
> the disk space hasn't been freed up at all.
>
> Looks like there are issues with snapshotting function after all.
>
> Andrei
>
>
>
>
>
>
> - Original Message -
> > From: "Harikrishna Patnala" 
> > To: "users" 
> > Sent: Tuesday, 8 June, 2021 03:33:57
> > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>
> > Hi Andrei,
> >
> > Can you check the following things and let us know?
> >
> >
> >  1.  Can you try creating a new volume and then create snapshot of
> that, to check
> >  if this an issue with old entries
> >  2.  For the snapshots which are failing can you check if you are
> seeing any
> >  error messages like this "Can't find an image storage in zone with
> less than".
> >  This is to check if secondary storage free space check failed.
> >  3.  For the snapshots which are failing and if it is delta snapshot
> can you
> >  check if its parent's snapshot entry exists in "snapshot_store_ref"
> table with
> >  'parent_snapshot_id' of the current snapshot with 'store_role'
> "Image". This is
> >  to find the secondary storage where the parent snapshot backup is
> located.
> >
> > Regards,
> > Harikrishna
> > 
> > From: Andrei Mikhailovsky 
> > Sent: Monday, June 7, 2021 7:00 PM
> > To: users 
> > Subject: Snapshots are not working after upgrading to 4.15.0
> >
> > Hello everyone,
> >
> > I am having an issue with volume snapshots since I've upgraded to
> 4.15.0. None
> > of the volumes are being snapshotted regardless if the snapshot is
> initiated
> > manually or from the schedule. The strange thing is that if I
> manually take the
> > snapshot, the GUI shows Success status, but the Storage>Snapshots
> show an Error
> > status. Here is what I see in the management server logs:
> >
> > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Done
> > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Remove
> > job-86143 from job monitoring
> > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy
> snapshot
> > com.cloud.utils.exception.CloudRuntimeException: can not find an
> image stores
> > at
> >
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> > at
> >
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> > at
> >
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> > at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> > at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> > at
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > at
> >
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$Sch

Re: Agent Error

2021-05-21 Thread Edward St Pierre
Hi,

Did you create the bridge interfaces on the hypervisor?

com.cloud.exception.InternalErrorException: Failed to create vnet 200: ls:
cannot access '/sys/class/net//brif/': No such file or directoryCommand



On Fri, 21 May 2021 at 10:29, Serge Byishimo 
wrote:

> Centos 8
> Cloudstack 4.15
>
>
> 2021-05-21 05:25:13,312 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) (logid:5fb6ba1e) Trying to fetch storage pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d from libvirt
> 2021-05-21 05:25:13,314 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:e0f54056) Trying to fetch storage pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d from libvirt
> 2021-05-21 05:25:13,321 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:e0f54056) Creating volume
> 398d3b32-c842-44c7-9feb-5b14e33df9fb from template
> 30cf8eed-df4b-45fd-a73c-4719769dfd3e in pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d (NetworkFilesystem) with size (0
> bytes) 0
> 2021-05-21 05:25:13,322 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:e0f54056) Attempting to create volume
> 398d3b32-c842-44c7-9feb-5b14e33df9fb (NetworkFilesystem) in pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d with size (2.44 GB) 262144
> 2021-05-21 05:25:13,634 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:e0f54056) Trying to fetch storage pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d from libvirt
> 2021-05-21 05:25:13,643 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:e0f54056) Trying to fetch storage pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d from libvirt
> 2021-05-21 05:25:13,660 WARN  [resource.wrapper.LibvirtStartCommandWrapper]
> (agentRequest-Handler-3:null) (logid:e0f54056) InternalErrorException
> com.cloud.exception.InternalErrorException: Failed to create vnet 200: ls:
> cannot access '/sys/class/net//brif/': No such file or directoryCommand
> line is not complete. Try option "help"ls: cannot access
> '/sys/class/net//brif/': No such file or directoryFailed to add vlan:
> br-200.200 to
> at
>
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.createVnet(BridgeVifDriver.java:325)
> at
>
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.createVnetBr(BridgeVifDriver.java:307)
> at
>
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.plug(BridgeVifDriver.java:227)
> at
>
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVif(LibvirtComputingResource.java:2726)
> at
>
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVifs(LibvirtComputingResource.java:2454)
> at
>
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStartCommandWrapper.execute(LibvirtStartCommandWrapper.java:80)
> at
>
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStartCommandWrapper.execute(LibvirtStartCommandWrapper.java:45)
> at
>
> com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
> at
>
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1643)
> at com.cloud.agent.Agent.processRequest(Agent.java:661)
> at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1079)
> at com.cloud.utils.nio.Task.call(Task.java:83)
> at com.cloud.utils.nio.Task.call(Task.java:29)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> 2021-05-21 05:25:13,671 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:e0f54056) Trying to fetch storage pool
> 26a9efbf-fb80-3f0d-a292-43bd0a3eec9d from libvirt
> 2021-05-21 05:25:16,542 INFO  [kvm.resource.LibvirtConnection]
> (agentRequest-Handler-2:null) (logid:5fb6ba1e) No existing libvirtd
> connection found. Opening a new one
> 2021-05-21 05:25:16,545 WARN  [kvm.resource.LibvirtConnection]
> (agentRequest-Handler-2:null) (logid:5fb6ba1e) Can not find a connection
> for Instance v-20-VM. Assuming the default connection.
> 2021-05-21 05:25:16,738 WARN  [kvm.resource.LibvirtKvmAgentHook]
> (agentRequest-Handler-2:null) (logid:5fb6ba1e) Groovy script
> '/etc/cloudstack/agent/hooks/libvirt-vm-state-change.groovy' is not
> available. Transformations will not be applied.
> 2021-05-21 05:25:16,738 WARN  [kvm.resource.LibvirtKvmAgentHook]
> (agentRequest-Handler-2:null) (logid:5fb6ba1e) Groovy scripting engine is
> not initialized. Data transformation skipped.
> 2021-05-21 05:25:16,784 INFO  [kvm.resource.LibvirtConnection]
> (agentRequest-Handler-4:null) (logid:e0f54056) No existing libvirtd
> connection found. Opening a new one
> 2021-05-21 05:25:16,785 WARN  [kvm.resource.LibvirtConnection]
> (agentRequest-Handler-4:null) (logid:e0f54056) Can not find a connection
> 

Re: Issue with Snapshots

2021-03-09 Thread Edward St Pierre
Hi,

Yes it was working on the previous version which  4.13

Regards

On Mon, 8 Mar 2021 at 13:02, Gabriel Bräscher  wrote:

> Hi,
>
> I think that we might have a bug on snapshot backed up on secondary. I have
> checked my tests and the image on secondary storage is not deleted as well;
> it has been passed more than enough time for CloudStack cleanup sec storage
> garbage.
> One question. Was it working before upgrading to 4.15? Which was the
> version that you were using before the upgrade?
>
> Maybe this issue is a good one to be reported via GitHub so we can have
> more developers looking at and keeping track of it.
> To open an issue you can go to https://github.com/apache/cloudstack/issues
> .
>
> Regards,
> Gabriel.
>
> Em sex., 5 de mar. de 2021 às 13:00, Edward St Pierre <
> edward.stpie...@gmail.com> escreveu:
>
> > Hi,
> >
> > Thanks for that information and pointers on where to look, I have checked
> > it out and the data was also on the secondary storage and the storage VM
> is
> > mounting the NFS servers etc.  (I have redeployed it just incase)
> >
> > I have gone through some of the older snapshots that are not getting
> > removed, and they do have that reference in the Ready Status:
> >
> > MariaDB [cloud]> select
> > id,store_id,snapshot_id,state,store_role,install_path from
> > snapshot_store_ref where install_path like
> > '%8c8fbbde-7d5b-4d25-a6ac-dd5ddedf7e07';
> >
> >
> +--+--+-+---++--+
> > | id   | store_id | snapshot_id | state | store_role | install_path
> > |
> >
> >
> +--+--+-+---++--+
> > | 1554 |7 | 852 | Destroyed | Primary|
> >
> >
> cloudstack2019/79625868-979a-4ed5-9f8d-2af6f612824c/8c8fbbde-7d5b-4d25-a6ac-dd5ddedf7e07
> > |
> > | 1555 |1 | 852 | Ready | Image  |
> > snapshots/2/146/8c8fbbde-7d5b-4d25-a6ac-dd5ddedf7e07
> >   |
> >
> >
> +--+--+-+---++--+
> > 2 rows in set (0.00 sec)
> >
> > MariaDB [cloud]> select * from snapshots where id=852;
> >
> >
> +-+++---+---+--+---+--+---+--+--
> >
> >
> >
> -+--+--+-+-++--++--+-+-+---+--+--+--
> >  -+
> > | id  | data_center_id | account_id | domain_id | volume_id |
> > disk_offering_id | status| path | name
> >  | uuid | snapshot_type
>  |
> > type_description | size | created | removed |
> > backup_snap_id | swift_id | sechost_id | prev_snap_id | hypervisor_type |
> > version | s3_id | min_iops | max_iops | location_  type |
> >
> >
> +-+++---+---+--+---+--+---+--+--
> >
> >
> >
> -+--+--+-+-++--++--+-+-+---+--+--+--
> >  -+
> > | 852 |  1 |  2 | 1 |   146 |
> >  6 | Destroyed | NULL | vxtr-ln1-box01_DATA-86_20210226010342 |
> > 10ece2f5-39cf-4afe-9894-48c7c47e1463 | 4   |
> > DAILY| 214748364800 | 2021-02-26 01:03:42 | NULL| NULL
> >   | NULL |   NULL | NULL | KVM | 2.2
>  |
> >  NULL | NULL | NULL | NULL|
> >
> >
> +-+++---+---+--+---+--+---+--+--
> >
> >
> >
> -+--+--+-+-++--++--+-+

Re: Issue with Snapshots

2021-03-05 Thread Edward St Pierre
quot;:"nfs://
10.100.7.51/export/secondary","_role":"Image"}},"vmName":"i-2-86-VM","name":"vxtr-ln1-box01_DATA-86_20210226010342","hypervisorType":"KVM","id":"852","quiescevm":"false","physicalSize":"0"}},"executeInSequence":"false","options":{"fullSnapshot":"true"},"options2":{},"wait":"21600"}}]
}
2021-02-26 01:22:17,334 DEBUG [c.c.a.t.Request]
(AgentManager-Handler-7:null) (logid:) Seq 10-5259641414815547085:
Processing:  { Ans: , MgmtId: 345050527765, via: 10, Ver: v1, Flags: 10,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"newData":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/2/146/8c8fbbde-7d5b-4d25-a6ac-dd5ddedf7e07","id":"0","quiescevm":"false","physicalSize":"112297967616"}},"result":"true","wait":"0"}}]
}
2021-02-26 01:22:17,355 DEBUG [c.c.s.s.SnapshotManagerImpl]
(Work-Job-Executor-17:ctx-efb43f2f job-6123/job-6124 ctx-bd5f01a2)
(logid:7c83430b) Max snaps: 3 exceeded for snapshot policy with Id: 7.
Deleting oldest snapshot: 849
2021-02-26 01:22:18,473 DEBUG [c.c.r.ResourceLimitManagerImpl]
(Work-Job-Executor-17:ctx-efb43f2f job-6123/job-6124 ctx-bd5f01a2)
(logid:7c83430b) Updating resource Type = snapshot count for Account = 2
Operation = decreasing Amount = 1
2021-02-26 01:22:18,499 DEBUG [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-17:ctx-efb43f2f job-6123/job-6124 ctx-bd5f01a2)
(logid:7c83430b) Done executing VM work job:
com.cloud.vm.VmWorkTakeVolumeSnapshot{"volumeId":146,"policyId":7,"snapshotId":852,"quiesceVm":false,"asyncBackup":false,"userId":1,"accountId":2,"vmId":86,"handlerName":"VolumeApiServiceImpl"}
2021-02-26 01:22:18,565 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-10:ctx-c366f638 job-6123 ctx-1be547c7) (logid:7c83430b)
Complete async job-6123, jobStatus: SUCCEEDED, resultCode: 0, result:
org.apache.cloudstack.api.response.SnapshotResponse/snapshot/{"id":"10ece2f5-39cf-4afe-9894-48c7c47e1463","account":"admin","domainid":"1e48287f-ee6a-11e9-8f54-0050569d3815","domain":"ROOT","snapshottype":"DAILY","volumeid":"79625868-979a-4ed5-9f8d-2af6f612824c","volumename":"DATA-86","volumetype":"DATADISK","created":"2021-02-26T01:03:42+","name":"vxtr-ln1-box01_DATA-86_20210226010342","intervaltype":"DAILY","state":"BackedUp","physicalsize":"112297967616","zoneid":"9000b853-f5e0-4451-ad49-64dfae95db84","revertable":"true","virtualsize":"214748364800","tags":[]}
2021-02-26 01:22:18,592 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-10:ctx-c366f638 job-6123) (logid:7c83430b) Done executing
org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd for
job-6123

This log indicates when it is set to be removed:
2021-03-01 01:22:13,750 DEBUG [c.c.s.s.SnapshotManagerImpl]
(Work-Job-Executor-20:ctx-df45986f job-6129/job-6130 ctx-a889fd20)
(logid:503cabb8) Max snaps: 3 exceeded for snapshot policy with Id: 7.
Deleting olest snapshot: 852
2021-03-01 01:22:13,797 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru]
(Work-Job-Executor-20:ctx-df45986f job-6129/job-6130 ctx-a889fd20)
(logid:503cabb8) getCommandHostDelegation: class
org.apache.cloudstack.storage.comand.DeleteCommand

Any indication on what I should look for in the logs for regarding the
'secondary storage "garbage" should be cleaned up'?

Many thanks for your help.

Ed

On Fri, 5 Mar 2021 at 15:12, Gabriel Bräscher  wrote:

> Before seeing the last reply I had just tested on CloudStack 4.15 + Ceph +
> KVM, and it worked fine; however, I did the test with one snapshot of the
> volume.
> Maybe the scheduled/incremental snapshotting adds a different variable to
> the equation.
>
> From my tests, the DB holds one reference as Ready on the secondary
> storage, and one as Destroyed (on primary storage). I think that it is the
> expected behavior and the secondary storage "garbage" should be cleaned up
> on a window of 24-48 hours (by default, as the cleanup thread runs each 24
> hours and it checks for a timestamp of 24 hours as well, on the worst case
> it can take up to 48 hours, on the best 24 hours).
>
> I find it quite weird the fact that it has been thrown a null pointer
> before; as if the volume has

Re: Issue with Snapshots

2021-03-05 Thread Edward St Pierre
quot;HOURLY","state":"BackedUp","physicalsize":"16888233984","zoneid":"9000b853-f5e0-4451-ad49-64dfae95db84","revertable":"true","ostypeid":"2cd2aa56-ee6a-11e9-8f54-0050569d3815","osdisplayname":"CentOS
7","virtualsize":"107374182400","tags":[]}
2021-03-05 10:09:25,670 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-19:ctx-3956f4cc job-6141) (logid:fc7d7320) Done executing
org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd for
job-6141

Here are the actual snapshots on secondary storage:

-rw-r--r--. 1 root root 16888233984 Mar  5 10:09
a00ba130-19f0-454c-9b74-55f95a64bfb5
-rw-r--r--. 1 root root 16888233984 Mar  5 09:09
b8e3403e-7b6c-4bee-8b27-5ad9843353cd

Here are the entries in the DB relating to the snapshots:

select * from snapshots order by created desc limit 2;
+-+++---+---+--+---+--++--+---+--+--+-+-++--++--+-+-+---+--+--+---+
| id  | data_center_id | account_id | domain_id | volume_id |
disk_offering_id | status| path | name
  | uuid | snapshot_type | type_description
| size | created | removed | backup_snap_id | swift_id
| sechost_id | prev_snap_id | hypervisor_type | version | s3_id | min_iops
| max_iops | location_type |
+-+++---+---+--+---+--++--+---+--+--+-+-++--++--+-+-+---+--+--+---+
| 861 |  1 |  9 | 6 |   225 |
 5 | BackedUp  | NULL | ESP-LN1-VOIP01_ROOT-156_20210305100342 |
076c3f3e-4875-419d-beed-398e6d938f41 | 3 | HOURLY   |
107374182400 | 2021-03-05 10:03:42 | NULL| NULL   | NULL |
  NULL | NULL | KVM | 2.2 |  NULL | NULL |
NULL | NULL  |
| 860 |  1 |  9 | 6 |   225 |
 5 | Destroyed | NULL | ESP-LN1-VOIP01_ROOT-156_20210305090342 |
2998a50e-7282-45c5-bab6-8231e6343783 | 3 | HOURLY   |
107374182400 | 2021-03-05 09:03:42 | NULL| NULL   | NULL |
  NULL | NULL | KVM | 2.2 |  NULL | NULL |
NULL | NULL  |
+-+++---+---+--+---+--++--+---+--+--+-+-++--++--+-+-+---+--+--+---+



Regards

Ed

On Fri, 5 Mar 2021 at 02:20, Gabriel Bräscher  wrote:

> There was an open issue indeed but a PR solved it for 4.15. The issue
> reported on this email looks a bit different than the ones I have seen.
>
> Background on a recent issue:
> - Issue #4498 "RBD Snapshot fails when snapshot.backup.to.secondary: true"
> has been fixed by PR #4568 "kvm: Fix double-escape issue while creating rbd
> disk options"
>
> Reading the log:
> - In order for such a null pointer to happen the volume from the snapshot
> has been deleted on DB. Thus, it is either a bug or an inconsistency on the
> DB; e.g.: deleting the volume prior to its Snapshots.
>
> Is this occurrence an isolated one or does it happen with multiple
> snapshots that have been deleted?
>
> #4498: https://github.com/apache/cloudstack/issues/4498
> #4568: https://github.com/apache/cloudstack/pull/4568
>
> Em qui., 4 de mar. de 2021 às 18:55, Rakesh v 
> escreveu:
>
> > If I remember properly there is already an issue raised for it in github
> >
> > Sent from my iPhone
> >
> > > On Mar 4, 2021, at 9:48 PM, Andrija Panic 
> > wrote:
> > >
> > > @Gabriel Beims Bräscher  does this rings any
> > bells
> > > for you? I haven't played with Ceph / snap clean-up issue on 4.15
> myself.
> > >
> > > Best,
> > >
> > >> On Tue, 2 Mar 2021 at 17:16, Edward St Pierre <
> > edward.stpie...@gmail.com>
> > >> wrote:
> > >>
> > >> Hi All,
> > >>
> > >> I Wonder if someone could help me.
>

Issue with Snapshots

2021-03-02 Thread Edward St Pierre
Hi All,

I Wonder if someone could help me.

Currently using ceph for primary storage and have
'snapshot.backup.to.secondary' enabled.

Since upgrading to 4.15 the volume snapshots do not seem to be getting
deleted from the secondary storage.  Could this be a bug?

Also when logged in as the main admin account when navigating to 'Storage /
Snapshots' an Error 500 is returned with the following error in the logs:

2021-03-02 16:11:01,018 ERROR [c.c.a.ApiServer]
(qtp1762902523-19:ctx-60162a51 ctx-75a1b986) (logid:368908bb) unhandled
exception executing api command: [Ljava.lang.String;@7ac2d249
java.lang.NullPointerException
at
org.apache.cloudstack.storage.snapshot.StorageSystemSnapshotStrategy.canHandle(StorageSystemSnapshotStrategy.java:944)
at
org.apache.cloudstack.storage.helper.StorageStrategyFactoryImpl$3.canHandle(StorageStrategyFactoryImpl.java:72)
at
org.apache.cloudstack.storage.helper.StorageStrategyFactoryImpl$3.canHandle(StorageStrategyFactoryImpl.java:69)
at
org.apache.cloudstack.storage.helper.StorageStrategyFactoryImpl.bestMatch(StorageStrategyFactoryImpl.java:95)
at
org.apache.cloudstack.storage.helper.StorageStrategyFactoryImpl.getSnapshotStrategy(StorageStrategyFactoryImpl.java:69)
at
org.apache.cloudstack.storage.snapshot.SnapshotObject.isRevertable(SnapshotObject.java:153)
at
com.cloud.api.ApiResponseHelper.createSnapshotResponse(ApiResponseHelper.java:569)
at
org.apache.cloudstack.api.command.user.snapshot.ListSnapshotsCmd.execute(ListSnapshotsCmd.java:117)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
at com.cloud.api.ApiServer.queueCommand(ApiServer.java:764)
at com.cloud.api.ApiServer.handleRequest(ApiServer.java:588)
at
com.cloud.api.ApiServlet.processRequestInContext(ApiServlet.java:321)
at com.cloud.api.ApiServlet$1.run(ApiServlet.java:134)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:131)
at com.cloud.api.ApiServlet.doGet(ApiServlet.java:93)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:645)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at
org.eclipse.jetty.servlet.ServletHolder$NotAsyncServlet.service(ServletHolder.java:1386)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:755)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:547)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:767)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:500)
at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:543)
at
org.eclipse.jetty.io.

Re: Health Checks for redundant VR or a VPC Router

2021-01-22 Thread Edward St Pierre
Hi,

Thanks for that Nicolas.
Is that option available on a per VR basis as I have only seen it as a
global option?

Ed

On Fri, 22 Jan 2021 at 14:31, Nicolas Vazquez
 wrote:

> Hi Edward,
>
> You can disable the health checks by setting
> 'router.health.checks.enabled' to false.
>
>
> Regards,
>
> Nicolas Vazquez
>
> ________
> From: Edward St Pierre 
> Sent: Friday, January 22, 2021 11:25 AM
> To: users@cloudstack.apache.org 
> Subject: Health Checks for redundant VR or a VPC Router
>
> Hi,
>
> I have recently updated to version 4.15 from 4.13 and it looks great,
> however I seem to have hit a problem with all the alerts filling up with
> the following error:
>
> Health checks failed: 1 failing checks on router
> 342ebf56-4128-4350-967f-cea40c26b080
>
> For pretty most of my redundant VR or VPC routers.
>
> I have logged directly onto these and have manually run the monitoring
> script with the following output:
>
> root@r-41-VM:~#  /usr/bin/python /root/monitorServices.py basic
> monitoring started
> No config items provided - means a redundant VR or a VPC Router
>
> I cannot seem to find a setting that will disable monitoring for this
> subset of device types.  Can anyone provide any advice on this?
>
> Ed
>
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> 3 London Bridge Street,  3rd floor, News Building, London  SE1 9SGUK
> @shapeblue
>
>
>
>


Health Checks for redundant VR or a VPC Router

2021-01-22 Thread Edward St Pierre
Hi,

I have recently updated to version 4.15 from 4.13 and it looks great,
however I seem to have hit a problem with all the alerts filling up with
the following error:

Health checks failed: 1 failing checks on router
342ebf56-4128-4350-967f-cea40c26b080

For pretty most of my redundant VR or VPC routers.

I have logged directly onto these and have manually run the monitoring
script with the following output:

root@r-41-VM:~#  /usr/bin/python /root/monitorServices.py basic
monitoring started
No config items provided - means a redundant VR or a VPC Router

I cannot seem to find a setting that will disable monitoring for this
subset of device types.  Can anyone provide any advice on this?

Ed