[ovirt-users] Re: General failure

2018-06-19 Thread nicolas

Hi Ala,

Yes, there's no way to remove it. I actually found a way to workaround 
the issue, I'm posting it so other people can have a clue how to solve 
it...


I'm fully aware that touching the DB directly is not a good idea but 
I've been unable to find an alternative way.


1) Find the broken snapshot in the snapshots table and delete it.

engine=# select snapshot_id,snapshot_type,status,description from 
snapshots where vm_id='343db85c-64bc-4f0c-b9a0-4ca8d129e0c3';
 snapshot_id  | snapshot_type | status | 
   description

--+---++---
 3d1eaf0a-49b3-45be-a104-f5ceebe52540 | ACTIVE| OK | Active 
VM
 cb8672bb-38d3-47ee-a498-4b403fc7d8db | REGULAR   | OK | Broken 
snapshot

(2 rows)

2) Find the image linked to the broken snapshot (you must find the disk 
in the Disks tab and write the UUID).


engine=# select 
image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active 
from images where image_group_id='6cf2c490-784b-437f-8305-1bed40dc9c9d';
  image_guid  |   parentid   
| imagestatus |vm_snapshot_id| volume_type | 
volume_format | active

--+--+-+--+-+---+
 b7af66ad-d27b-4087-9c33-11625912a45f | 
---- |   4 | 
cb8672bb-38d3-47ee-a498-4b403fc7d8db |   1 | 5 | f
 7f14ae53-feac-4088-9560-c77a16dcd5e3 | 
b7af66ad-d27b-4087-9c33-11625912a45f |   1 | 
3d1eaf0a-49b3-45be-a104-f5ceebe52540 |   2 | 4 | t

(2 rows)

3) Delete the broken snapshot from the snapshots table.

engine=# delete  from snapshots where 
snapshot_id='cb8672bb-38d3-47ee-a498-4b403fc7d8db';

DELETE 1

4) Delete the associated image to the broken snapshots.

engine=# delete from images where 
image_guid='7f14ae53-feac-4088-9560-c77a16dcd5e3';

DELETE 1

At this time, the snapshot is no longer shown on the 'Snapshots' tab of 
the VM. However, when starting the VM, I get an error with something 
like this:


VM SED-tpl is down with error. Exit message: Bad volume specification 
{u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01', 
'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address': 
{u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x', u'type': 
u'pci', u'slot': u'0x06'}, u'volumeID': 
u'538600a5-31ab-40af-b326-d56bfc92bb0b', 'apparentsize': '34359738368', 
u'imageID': u'e05874d2-fb8a-4fd2-94ff-2f4bc6438d47', u'discard': False, 
u'specParams': {}, u'readonly': u'false', u'iface': u'virtio', 
u'optional': u'false', u'deviceId': 
u'e05874d2-fb8a-4fd2-94ff-2f4bc6438d47', 'truesize': '34359738368', 
u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device': u'disk', 
u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}.


Now, get the storage domain ID on which the failed snapshot is. You can 
know that by looking at the 'storage_domains' table in the DB. Then, run 
the command that Benny mentioned (being the last UUID the one of the 
storage domain):


vdsm-tool -vvv dump-volume-chains 
bc0480e2-85fe-42a4-91ae-f733b23c801f


That will provide a map of the image list with all volumes and their 
statuses. You should see at least an ILLEGAL one. Despite removing the 
entries from the DB, they seem to still show up  because the snapshot is 
still in metadata and needs to be set as LEGAL. To set it as legal, 
start the VM on a specific host where previously you are 'tail -f'ing 
the vdsm.log. You'll see an entry like this:


2018-06-19 12:13:26,832+0100 INFO  (vm/5bf9a0bb) [vdsm.api] START 
prepareImage(sdUUID=u'bc0480e2-85fe-42a4-91ae-f733b23c801f', 
spUUID=u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', 
imgUUID=u'870f7e85-d9b6-494a-9541-b419fb0e1b32', 
leafUUID=u'd7fa8c51-8cad-4695-b90a-a8d1dc146371', allowIllegal=False) 
from=internal, task_id=89770233-103d-47f3-acc1-45d2e96d9e91 (api:46)


Now, go to the SPM and run a command like this:

  vdsClient -s yourhost.com setVolumeLegality 
bc0480e2-85fe-42a4-91ae-f733b23c801f 
75bf8f48-970f-42bc-8596-f8ab6efb2b63 
870f7e85-d9b6-494a-9541-b419fb0e1b32 
d7fa8c51-8cad-4695-b90a-a8d1dc146371 LEGAL


The VM is now able to power up. I know this is not a clean solution as 
this leaves orphaned snapshots on storage domains, but up until now 
we've not been able to find a better solution. At least now we know 
machines can be powered up and no data loss happened.


If you have any additional tips we'd be glad to know so we can apply 
them.


Thanks.

El 2018-06-19 14:41, Ala Hino escribió:

Hi,

Did you try to remove the same snapshot while the VM is down?

On Tue, Jun 19, 2018 at 10:44 AM,  wrote:


Hi Benny,

I used the tool to track one of the illegal volumes:

   image:    

[ovirt-users] Re: General failure

2018-06-19 Thread Ala Hino
Hi,

Did you try to remove the same snapshot while the VM is down?

On Tue, Jun 19, 2018 at 10:44 AM,  wrote:

> Hi Benny,
>
> I used the tool to track one of the illegal volumes:
>
>image:e05874d2-fb8a-4fd2-94ff-2f4bc6438d47
>
>  [...]
>
>  - 887f486b-15cf-4083-9b35-8b7821a7841a
>status: ILLEGAL, voltype: LEAF, format: COW, legality:
> ILLEGAL, type: SPARSE
>
> So I tracked 887f486b-15cf-4083-9b35-8b7821a7841a in the logs and I saw:
>
> 2018-06-16 04:46:20,818+01 INFO  [org.ovirt.engine.core.vdsbrok
> er.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-3)
> [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] START,
> GetVolumeInfoVDSCommand(HostName = host.domain.es,
> GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
> runAsync='true', hostId='b2dfb945-d767-44aa-a547-2d1a4381f8e3',
> storagePoolId='75bf8f48-970f-42bc-8596-f8ab6efb2b63',
> storageDomainId='110ea376-d789-40a1-b9f6-6b40c31afe01',
> imageGroupId='e05874d2-fb8a-4fd2-94ff-2f4bc6438d47',
> imageId='887f486b-15cf-4083-9b35-8b7821a7841a'}), log id: 2a795424
>
> 2018-06-16 04:46:22,256+01 ERROR 
> [org.ovirt.engine.core.bll.DestroyImageCheckCommand]
> (pool-5-thread-3) [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] The following
> images were not removed: [887f486b-15cf-4083-9b35-8b7821a7841a]
>
> 2018-06-16 04:47:44,900+01 ERROR [org.ovirt.engine.core.bll.sna
> pshots.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler10)
> [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] Snapshot
> '7b6f43ac-d3ad-47b2-8882-f5dccd74cf07' images
> '887f486b-15cf-4083-9b35-8b7821a7841a'..'538600a5-31ab-40af-b326-d56bfc92bb0b'
> merged, but volume removal failed. Some or all of the following volumes may
> be orphaned: [887f486b-15cf-4083-9b35-8b7821a7841a]. Please retry Live
> Merge on the snapshot to complete the operation.
>
> Can you provide some additional steps?
>
> Thank you!
>
>
> El 2018-06-18 18:27, Benny Zlotnik escribió:
>
>> We prevent starting VMs with illegal images[1]
>>
>> You can use "$ vdsm-tool dump-volume-chains"
>> to look for illegal images and then look in the engine log for the
>> reason they became illagal,
>>
>> if it's something like this, it usually means you can remove them:
>>
>> 63696:2018-06-15 09:41:58,134+01 ERROR
>> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
>> (DefaultQuartzScheduler2) [6fa97ea4-8f61-4a48-8e08-a8bb1b9de826]
>> Merging of snapshot 'e609d6cc-2025-4cf0-ad34-03519131cdd1' images
>> '1d01c6c8-b61e-42bc-a054-f04c3f792b10'..'ef6f732e-2a7a-4a14-
>> a10f-bcc88bdd805f'
>> failed. Images have been marked illegal and can no longer be previewed
>> or reverted to. Please retry Live Merge on the snapshot to complete
>> the operation.
>>
>> On Mon, Jun 18, 2018 at 5:46 PM,  wrote:
>>
>> Indeed, when the problem started I think the SPM was the host I
>>> added as VDSM log in the first e-mail. Currently it is the one I
>>> sent in the second mail.
>>>
>>> FWIW, if it helps to debug more fluently, we can provide VPN access
>>> to our infrastructure so you can access and see whateve you need
>>> (all hosts, DB, etc...).
>>>
>>> Right now the machines that keep running work, but once shut down
>>> they start showing the problem below...
>>>
>>> Thank you
>>>
>>> El 2018-06-18 15:20, Benny Zlotnik escribió:
>>>
>>> I'm having trouble following the errors, I think the SPM changed or
>>> the vdsm log from the right host might be missing.
>>>
>>> However, I believe what started the problems is this transaction
>>> timeout:
>>>
>>> 2018-06-15 14:20:51,378+01 ERROR
>>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>>> (org.ovirt.thread.pool-6-thread-29)
>>> [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction
>>> for
>>> action type RemoveSnapshotSingleDisk threw an exception.:
>>> org.springframework.jdbc.CannotGetJdbcConnectionException: Could
>>> not
>>> get JDBC Connection; nested exception is java.sql.SQLException:
>>> javax.resource.ResourceException: IJ000460: Error checking for a
>>> transaction
>>>  at
>>>
>>> org.springframework.jdbc.datasource.DataSourceUtils.getConne
>> ction(DataSourceUtils.java:80)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
>>>  at
>>>
>>> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
>> ate.java:615)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
>>>  at
>>>
>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
>>>  at
>>>
>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
>>>  at
>>>
>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
>>>  at
>>>
>>> org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$P
>> ostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDi
>> alect.java:152)
>>
>>> [dal.jar:]
>>>
>>> This looks like a bug
>>>
>>> Regardless, I am not sure restoring a backup would help since you
>>> 

[ovirt-users] Re: General failure

2018-06-19 Thread nicolas

Hi Benny,

I used the tool to track one of the illegal volumes:

   image:e05874d2-fb8a-4fd2-94ff-2f4bc6438d47

 [...]

 - 887f486b-15cf-4083-9b35-8b7821a7841a
   status: ILLEGAL, voltype: LEAF, format: COW, legality: 
ILLEGAL, type: SPARSE


So I tracked 887f486b-15cf-4083-9b35-8b7821a7841a in the logs and I saw:

2018-06-16 04:46:20,818+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] 
(pool-5-thread-3) [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] START, 
GetVolumeInfoVDSCommand(HostName = host.domain.es, 
GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]', 
runAsync='true', hostId='b2dfb945-d767-44aa-a547-2d1a4381f8e3', 
storagePoolId='75bf8f48-970f-42bc-8596-f8ab6efb2b63', 
storageDomainId='110ea376-d789-40a1-b9f6-6b40c31afe01', 
imageGroupId='e05874d2-fb8a-4fd2-94ff-2f4bc6438d47', 
imageId='887f486b-15cf-4083-9b35-8b7821a7841a'}), log id: 2a795424


2018-06-16 04:46:22,256+01 ERROR 
[org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-3) 
[cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] The following images were not 
removed: [887f486b-15cf-4083-9b35-8b7821a7841a]


2018-06-16 04:47:44,900+01 ERROR 
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] 
(DefaultQuartzScheduler10) [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] 
Snapshot '7b6f43ac-d3ad-47b2-8882-f5dccd74cf07' images 
'887f486b-15cf-4083-9b35-8b7821a7841a'..'538600a5-31ab-40af-b326-d56bfc92bb0b' 
merged, but volume removal failed. Some or all of the following volumes 
may be orphaned: [887f486b-15cf-4083-9b35-8b7821a7841a]. Please retry 
Live Merge on the snapshot to complete the operation.


Can you provide some additional steps?

Thank you!

El 2018-06-18 18:27, Benny Zlotnik escribió:

We prevent starting VMs with illegal images[1] 

You can use "$ vdsm-tool dump-volume-chains"
to look for illegal images and then look in the engine log for the
reason they became illagal, 

if it's something like this, it usually means you can remove them:

63696:2018-06-15 09:41:58,134+01 ERROR
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
(DefaultQuartzScheduler2) [6fa97ea4-8f61-4a48-8e08-a8bb1b9de826]
Merging of snapshot 'e609d6cc-2025-4cf0-ad34-03519131cdd1' images
'1d01c6c8-b61e-42bc-a054-f04c3f792b10'..'ef6f732e-2a7a-4a14-a10f-bcc88bdd805f'
failed. Images have been marked illegal and can no longer be previewed
or reverted to. Please retry Live Merge on the snapshot to complete
the operation.

On Mon, Jun 18, 2018 at 5:46 PM,  wrote:


Indeed, when the problem started I think the SPM was the host I
added as VDSM log in the first e-mail. Currently it is the one I
sent in the second mail.

FWIW, if it helps to debug more fluently, we can provide VPN access
to our infrastructure so you can access and see whateve you need
(all hosts, DB, etc...).

Right now the machines that keep running work, but once shut down
they start showing the problem below...

Thank you

El 2018-06-18 15:20, Benny Zlotnik escribió:

I'm having trouble following the errors, I think the SPM changed or
the vdsm log from the right host might be missing.

However, I believe what started the problems is this transaction
timeout:

2018-06-15 14:20:51,378+01 ERROR
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-29)
[1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction
for
action type RemoveSnapshotSingleDisk threw an exception.:
org.springframework.jdbc.CannotGetJdbcConnectionException: Could
not
get JDBC Connection; nested exception is java.sql.SQLException:
javax.resource.ResourceException: IJ000460: Error checking for a
transaction
 at


org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)

[spring-jdbc.jar:4.2.4.RELEASE]
 at


org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)

[spring-jdbc.jar:4.2.4.RELEASE]
 at


org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)

[spring-jdbc.jar:4.2.4.RELEASE]
 at


org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)

[spring-jdbc.jar:4.2.4.RELEASE]
 at


org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)

[spring-jdbc.jar:4.2.4.RELEASE]
 at


org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)

[dal.jar:]

This looks like a bug

Regardless, I am not sure restoring a backup would help since you
probably have orphaned images on the storage which need to be
removed

Adding Ala

On Mon, Jun 18, 2018 at 4:19 PM,  wrote:

Hi Benny,

Please find the SPM logs at [1].

Thank you

  [1]:



https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee

[1]
[1]

El 2018-06-18 13:19, Benny Zlotnik escribió:
Can you send the SPM logs as well?

On Mon, Jun 18, 2018 at 1:13 PM,  wrote:

Hi Benny,


[ovirt-users] Re: General failure

2018-06-18 Thread Benny Zlotnik
We prevent starting VMs with illegal images[1]


You can use "$ vdsm-tool dump-volume-chains"
to look for illegal images and then look in the engine log for the reason
they became illagal,

if it's something like this, it usually means you can remove them:
63696:2018-06-15 09:41:58,134+01 ERROR
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
(DefaultQuartzScheduler2) [6fa97ea4-8f61-4a48-8e08-a8bb1b9de826] Merging of
snapshot 'e609d6cc-2025-4cf0-ad34-03519131cdd1' images
'1d01c6c8-b61e-42bc-a054-f04c3f792b10'..'ef6f732e-2a7a-4a14-a10f-bcc88bdd805f'
failed. Images have been marked illegal and can no longer be previewed or
reverted to. Please retry Live Merge on the snapshot to complete the
operation.


On Mon, Jun 18, 2018 at 5:46 PM,  wrote:

> Indeed, when the problem started I think the SPM was the host I added as
> VDSM log in the first e-mail. Currently it is the one I sent in the second
> mail.
>
> FWIW, if it helps to debug more fluently, we can provide VPN access to our
> infrastructure so you can access and see whateve you need (all hosts, DB,
> etc...).
>
> Right now the machines that keep running work, but once shut down they
> start showing the problem below...
>
> Thank you
>
>
> El 2018-06-18 15:20, Benny Zlotnik escribió:
>
>> I'm having trouble following the errors, I think the SPM changed or
>> the vdsm log from the right host might be missing.
>>
>> However, I believe what started the problems is this transaction
>> timeout:
>>
>> 2018-06-15 14:20:51,378+01 ERROR
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-29)
>> [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction for
>> action type RemoveSnapshotSingleDisk threw an exception.:
>> org.springframework.jdbc.CannotGetJdbcConnectionException: Could not
>> get JDBC Connection; nested exception is java.sql.SQLException:
>> javax.resource.ResourceException: IJ000460: Error checking for a
>> transaction
>>  at
>> org.springframework.jdbc.datasource.DataSourceUtils.getConne
>> ction(DataSourceUtils.java:80)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$P
>> ostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDi
>> alect.java:152)
>> [dal.jar:]
>>
>> This looks like a bug
>>
>> Regardless, I am not sure restoring a backup would help since you
>> probably have orphaned images on the storage which need to be removed
>>
>> Adding Ala
>>
>> On Mon, Jun 18, 2018 at 4:19 PM,  wrote:
>>
>> Hi Benny,
>>>
>>> Please find the SPM logs at [1].
>>>
>>> Thank you
>>>
>>>   [1]:
>>>
>>> https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b
>> 0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee
>>
>>> [1]
>>>
>>> El 2018-06-18 13:19, Benny Zlotnik escribió:
>>> Can you send the SPM logs as well?
>>>
>>> On Mon, Jun 18, 2018 at 1:13 PM,  wrote:
>>>
>>> Hi Benny,
>>>
>>> Please find the logs at [1].
>>>
>>> Thank you.
>>>
>>>   [1]:
>>>
>>>
>>> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
>> 94c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
>>
>>> [2]
>>>
>>> [1]
>>>
>>> El 2018-06-18 09:28, Benny Zlotnik escribió:
>>>
>>> Can you provide full engine and vdsm logs?
>>>
>>> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
>>>
>>> Hi,
>>>
>>> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
>>> we're having a major problem in our infrastructure. On friday, a
>>> snapshots were automatically created on more than 200 VMs and as
>>> this was just a test task, all of them were deleted at the same
>>> time, which seems to have corrupted several VMs.
>>>
>>> When trying to delete a snapshot on some of the VMs, a "General
>>> error" is thrown with a NullPointerException in the engine log
>>> (attached).
>>>
>>> But the worst part is that when some of these machines is powered
>>> off and then powered on, the VMs are corrupt...
>>>
>>> VM myvm is down with error. Exit message: Bad volume specification
>>> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
>>> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
>>> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
>>> u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
>>> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
>>> '23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
>>> u'discard': False, u'specParams': {}, u'readonly': u'false',
>>> u'iface': u'virtio', u'optional': 

[ovirt-users] Re: General failure

2018-06-18 Thread Marcelo Leandro
Hi Marcelo,
Do you mean copying the whole disk block to a different device and
attaching it to a new VM?
Yes
Anything will be appreciated, as currently we're facing a distressing
situation, so if you can describe what you mean I'd be grateful.
1-Create a New vm with same configs.
2-See Volume ID in snapshot tab (new and old vm).
3-Acess a host :
4- find the volume :
run this command for 2 disks(new and old)
find /rhve -name VolumeID
5-check format disk
qemu-img info (path of VolumeID)
Send for me the results.

2018-06-18 12:38 GMT-03:00 :

> Hi Marcelo,
>
> Do you mean copying the whole disk block to a different device and
> attaching it to a new VM?
>
> Anything will be appreciated, as currently we're facing a distressing
> situation, so if you can describe what you mean I'd be grateful.
>
> Thanks.
>
>
> El 2018-06-18 16:19, Marcelo Leandro escribió:
>
>> Hello,
>> Do you can copy diskbase to a new vm.
>>
>> If you want I can describe the step.
>>
>> Em seg, 18 de jun de 2018 11:49,  escreveu:
>>
>> Indeed, when the problem started I think the SPM was the host I
>>> added as
>>> VDSM log in the first e-mail. Currently it is the one I sent in the
>>>
>>> second mail.
>>>
>>> FWIW, if it helps to debug more fluently, we can provide VPN access
>>> to
>>> our infrastructure so you can access and see whateve you need (all
>>> hosts, DB, etc...).
>>>
>>> Right now the machines that keep running work, but once shut down
>>> they
>>> start showing the problem below...
>>>
>>> Thank you
>>>
>>> El 2018-06-18 15:20, Benny Zlotnik escribió:
>>>
 I'm having trouble following the errors, I think the SPM changed

>>> or
>>>
 the vdsm log from the right host might be missing.

 However, I believe what started the problems is this transaction
 timeout:

 2018-06-15 14:20:51,378+01 ERROR
 [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
 (org.ovirt.thread.pool-6-thread-29)
 [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction

>>> for
>>>
 action type RemoveSnapshotSingleDisk threw an exception.:
 org.springframework.jdbc.CannotGetJdbcConnectionException: Could

>>> not
>>>
 get JDBC Connection; nested exception is java.sql.SQLException:
 javax.resource.ResourceException: IJ000460: Error checking for a
 transaction
   at


>>> org.springframework.jdbc.datasource.DataSourceUtils.getConne
>> ction(DataSourceUtils.java:80)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
   at


>>> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
>> ate.java:615)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
   at


>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
   at


>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
   at


>>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
>>
>>> [spring-jdbc.jar:4.2.4.RELEASE]
   at


>>> org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$P
>> ostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDi
>> alect.java:152)
>>
>>> [dal.jar:]

 This looks like a bug

 Regardless, I am not sure restoring a backup would help since you
 probably have orphaned images on the storage which need to be

>>> removed
>>>

 Adding Ala

 On Mon, Jun 18, 2018 at 4:19 PM,  wrote:

 Hi Benny,
>
> Please find the SPM logs at [1].
>
> Thank you
>
>   [1]:
>
>

>>> https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b
>> 0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee
>>
>>> [1]
>>>
 [1]
>
> El 2018-06-18 13:19, Benny Zlotnik escribió:
> Can you send the SPM logs as well?
>
> On Mon, Jun 18, 2018 at 1:13 PM,  wrote:
>
> Hi Benny,
>
> Please find the logs at [1].
>
> Thank you.
>
>   [1]:
>
>
>

>>> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
>> 94c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
>>
>>> [2]
>>>
 [2]
> [1]
>
> El 2018-06-18 09:28, Benny Zlotnik escribió:
>
> Can you provide full engine and vdsm logs?
>
> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
>
> Hi,
>
> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
> we're having a major problem in our infrastructure. On friday, a
> snapshots were automatically created on more than 200 VMs and as
> this was just a test task, all of them were deleted at the same
> time, which seems to have corrupted several VMs.
>
> When trying to delete a snapshot on some of the VMs, a "General
> error" is thrown with a NullPointerException in the engine log
> (attached).
>
> But the worst part is that when some of these machines is
>
 

[ovirt-users] Re: General failure

2018-06-18 Thread nicolas

Hi Marcelo,

Do you mean copying the whole disk block to a different device and 
attaching it to a new VM?


Anything will be appreciated, as currently we're facing a distressing 
situation, so if you can describe what you mean I'd be grateful.


Thanks.

El 2018-06-18 16:19, Marcelo Leandro escribió:

Hello, 
Do you can copy diskbase to a new vm.

If you want I can describe the step.

Em seg, 18 de jun de 2018 11:49,  escreveu:


Indeed, when the problem started I think the SPM was the host I
added as
VDSM log in the first e-mail. Currently it is the one I sent in the

second mail.

FWIW, if it helps to debug more fluently, we can provide VPN access
to
our infrastructure so you can access and see whateve you need (all
hosts, DB, etc...).

Right now the machines that keep running work, but once shut down
they
start showing the problem below...

Thank you

El 2018-06-18 15:20, Benny Zlotnik escribió:

I'm having trouble following the errors, I think the SPM changed

or

the vdsm log from the right host might be missing.

However, I believe what started the problems is this transaction
timeout:

2018-06-15 14:20:51,378+01 ERROR
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-29)
[1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction

for

action type RemoveSnapshotSingleDisk threw an exception.:
org.springframework.jdbc.CannotGetJdbcConnectionException: Could

not

get JDBC Connection; nested exception is java.sql.SQLException:
javax.resource.ResourceException: IJ000460: Error checking for a
transaction
  at




org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)

[spring-jdbc.jar:4.2.4.RELEASE]
  at




org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)

[spring-jdbc.jar:4.2.4.RELEASE]
  at




org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)

[spring-jdbc.jar:4.2.4.RELEASE]
  at




org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)

[spring-jdbc.jar:4.2.4.RELEASE]
  at




org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)

[spring-jdbc.jar:4.2.4.RELEASE]
  at




org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)

[dal.jar:]

This looks like a bug

Regardless, I am not sure restoring a backup would help since you
probably have orphaned images on the storage which need to be

removed


Adding Ala

On Mon, Jun 18, 2018 at 4:19 PM,  wrote:


Hi Benny,

Please find the SPM logs at [1].

Thank you

  [1]:






https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee

[1]

[1]

El 2018-06-18 13:19, Benny Zlotnik escribió:
Can you send the SPM logs as well?

On Mon, Jun 18, 2018 at 1:13 PM,  wrote:

Hi Benny,

Please find the logs at [1].

Thank you.

  [1]:







https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d

[2]

[2]
[1]

El 2018-06-18 09:28, Benny Zlotnik escribió:

Can you provide full engine and vdsm logs?

On Mon, Jun 18, 2018 at 11:20 AM,  wrote:

Hi,

We're running oVirt 4.1.9 (we cannot upgrade at this time) and
we're having a major problem in our infrastructure. On friday, a
snapshots were automatically created on more than 200 VMs and as
this was just a test task, all of them were deleted at the same
time, which seems to have corrupted several VMs.

When trying to delete a snapshot on some of the VMs, a "General
error" is thrown with a NullPointerException in the engine log
(attached).

But the worst part is that when some of these machines is

powered

off and then powered on, the VMs are corrupt...

VM myvm is down with error. Exit message: Bad volume

specification

{u'index': 0, u'domainID':

u'110ea376-d789-40a1-b9f6-6b40c31afe01',

'reqsize': '0', u'format': u'cow', u'bootOrder': u'1',

u'address':

{u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
'23622320128', u'imageID':

u'65519220-68e1-462a-99b3-f0763c78eae2',

u'discard': False, u'specParams': {}, u'readonly': u'false',
u'iface': u'virtio', u'optional': u'false', u'deviceId':
u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize':

'23622320128',

u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
u'disk', u'shared': u'false', u'propagateErrors': u'off',

u'type':

u'disk'}.

We're really frustrated by now and don't know how to procceed...

We

have a DB backup (with engine-backup) from thursday which would
have
a "sane" DB definition without all the snapshots, as they were

all

created on friday. Would it be safe to restore this backup?

Any help is really appreciated...

Thanks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email 

[ovirt-users] Re: General failure

2018-06-18 Thread Marcelo Leandro
Hello,
Do you can copy diskbase to a new vm.

If you want I can describe the step.

Em seg, 18 de jun de 2018 11:49,  escreveu:

> Indeed, when the problem started I think the SPM was the host I added as
> VDSM log in the first e-mail. Currently it is the one I sent in the
> second mail.
>
> FWIW, if it helps to debug more fluently, we can provide VPN access to
> our infrastructure so you can access and see whateve you need (all
> hosts, DB, etc...).
>
> Right now the machines that keep running work, but once shut down they
> start showing the problem below...
>
> Thank you
>
> El 2018-06-18 15:20, Benny Zlotnik escribió:
> > I'm having trouble following the errors, I think the SPM changed or
> > the vdsm log from the right host might be missing.
> >
> > However, I believe what started the problems is this transaction
> > timeout:
> >
> > 2018-06-15 14:20:51,378+01 ERROR
> > [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
> > (org.ovirt.thread.pool-6-thread-29)
> > [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction for
> > action type RemoveSnapshotSingleDisk threw an exception.:
> > org.springframework.jdbc.CannotGetJdbcConnectionException: Could not
> > get JDBC Connection; nested exception is java.sql.SQLException:
> > javax.resource.ResourceException: IJ000460: Error checking for a
> > transaction
> >  at
> >
> org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
> > [spring-jdbc.jar:4.2.4.RELEASE]
> >  at
> > org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
> > [spring-jdbc.jar:4.2.4.RELEASE]
> >  at
> > org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
> > [spring-jdbc.jar:4.2.4.RELEASE]
> >  at
> > org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
> > [spring-jdbc.jar:4.2.4.RELEASE]
> >  at
> > org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
> > [spring-jdbc.jar:4.2.4.RELEASE]
> >  at
> >
> org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)
> > [dal.jar:]
> >
> > This looks like a bug
> >
> > Regardless, I am not sure restoring a backup would help since you
> > probably have orphaned images on the storage which need to be removed
> >
> > Adding Ala
> >
> > On Mon, Jun 18, 2018 at 4:19 PM,  wrote:
> >
> >> Hi Benny,
> >>
> >> Please find the SPM logs at [1].
> >>
> >> Thank you
> >>
> >>   [1]:
> >>
> >
> https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee
> >> [1]
> >>
> >> El 2018-06-18 13:19, Benny Zlotnik escribió:
> >> Can you send the SPM logs as well?
> >>
> >> On Mon, Jun 18, 2018 at 1:13 PM,  wrote:
> >>
> >> Hi Benny,
> >>
> >> Please find the logs at [1].
> >>
> >> Thank you.
> >>
> >>   [1]:
> >>
> >>
> >
> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
> >> [2]
> >> [1]
> >>
> >> El 2018-06-18 09:28, Benny Zlotnik escribió:
> >>
> >> Can you provide full engine and vdsm logs?
> >>
> >> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
> >>
> >> Hi,
> >>
> >> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
> >> we're having a major problem in our infrastructure. On friday, a
> >> snapshots were automatically created on more than 200 VMs and as
> >> this was just a test task, all of them were deleted at the same
> >> time, which seems to have corrupted several VMs.
> >>
> >> When trying to delete a snapshot on some of the VMs, a "General
> >> error" is thrown with a NullPointerException in the engine log
> >> (attached).
> >>
> >> But the worst part is that when some of these machines is powered
> >> off and then powered on, the VMs are corrupt...
> >>
> >> VM myvm is down with error. Exit message: Bad volume specification
> >> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
> >> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
> >> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
> >> u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
> >> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
> >> '23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
> >> u'discard': False, u'specParams': {}, u'readonly': u'false',
> >> u'iface': u'virtio', u'optional': u'false', u'deviceId':
> >> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
> >> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
> >> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
> >> u'disk'}.
> >>
> >> We're really frustrated by now and don't know how to procceed... We
> >> have a DB backup (with engine-backup) from thursday which would
> >> have
> >> a "sane" DB definition without all the snapshots, as they were all
> >> created on friday. Would it be safe to restore this backup?
> >>
> >> Any help is really 

[ovirt-users] Re: General failure

2018-06-18 Thread Benny Zlotnik
Can you add the server.log?

On Mon, Jun 18, 2018 at 5:46 PM,  wrote:

> Indeed, when the problem started I think the SPM was the host I added as
> VDSM log in the first e-mail. Currently it is the one I sent in the second
> mail.
>
> FWIW, if it helps to debug more fluently, we can provide VPN access to our
> infrastructure so you can access and see whateve you need (all hosts, DB,
> etc...).
>
> Right now the machines that keep running work, but once shut down they
> start showing the problem below...
>
> Thank you
>
>
> El 2018-06-18 15:20, Benny Zlotnik escribió:
>
>> I'm having trouble following the errors, I think the SPM changed or
>> the vdsm log from the right host might be missing.
>>
>> However, I believe what started the problems is this transaction
>> timeout:
>>
>> 2018-06-15 14:20:51,378+01 ERROR
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-29)
>> [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction for
>> action type RemoveSnapshotSingleDisk threw an exception.:
>> org.springframework.jdbc.CannotGetJdbcConnectionException: Could not
>> get JDBC Connection; nested exception is java.sql.SQLException:
>> javax.resource.ResourceException: IJ000460: Error checking for a
>> transaction
>>  at
>> org.springframework.jdbc.datasource.DataSourceUtils.getConne
>> ction(DataSourceUtils.java:80)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>>  at
>> org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$P
>> ostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDi
>> alect.java:152)
>> [dal.jar:]
>>
>> This looks like a bug
>>
>> Regardless, I am not sure restoring a backup would help since you
>> probably have orphaned images on the storage which need to be removed
>>
>> Adding Ala
>>
>> On Mon, Jun 18, 2018 at 4:19 PM,  wrote:
>>
>> Hi Benny,
>>>
>>> Please find the SPM logs at [1].
>>>
>>> Thank you
>>>
>>>   [1]:
>>>
>>> https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b
>> 0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee
>>
>>> [1]
>>>
>>> El 2018-06-18 13:19, Benny Zlotnik escribió:
>>> Can you send the SPM logs as well?
>>>
>>> On Mon, Jun 18, 2018 at 1:13 PM,  wrote:
>>>
>>> Hi Benny,
>>>
>>> Please find the logs at [1].
>>>
>>> Thank you.
>>>
>>>   [1]:
>>>
>>>
>>> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
>> 94c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
>>
>>> [2]
>>>
>>> [1]
>>>
>>> El 2018-06-18 09:28, Benny Zlotnik escribió:
>>>
>>> Can you provide full engine and vdsm logs?
>>>
>>> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
>>>
>>> Hi,
>>>
>>> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
>>> we're having a major problem in our infrastructure. On friday, a
>>> snapshots were automatically created on more than 200 VMs and as
>>> this was just a test task, all of them were deleted at the same
>>> time, which seems to have corrupted several VMs.
>>>
>>> When trying to delete a snapshot on some of the VMs, a "General
>>> error" is thrown with a NullPointerException in the engine log
>>> (attached).
>>>
>>> But the worst part is that when some of these machines is powered
>>> off and then powered on, the VMs are corrupt...
>>>
>>> VM myvm is down with error. Exit message: Bad volume specification
>>> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
>>> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
>>> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
>>> u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
>>> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
>>> '23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
>>> u'discard': False, u'specParams': {}, u'readonly': u'false',
>>> u'iface': u'virtio', u'optional': u'false', u'deviceId':
>>> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
>>> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
>>> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
>>> u'disk'}.
>>>
>>> We're really frustrated by now and don't know how to procceed... We
>>> have a DB backup (with engine-backup) from thursday which would
>>> have
>>> a "sane" DB definition without all the snapshots, as they were all
>>> created on friday. Would it be safe to restore this backup?
>>>
>>> Any help is really appreciated...
>>>
>>> Thanks.
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send 

[ovirt-users] Re: General failure

2018-06-18 Thread nicolas
Indeed, when the problem started I think the SPM was the host I added as 
VDSM log in the first e-mail. Currently it is the one I sent in the 
second mail.


FWIW, if it helps to debug more fluently, we can provide VPN access to 
our infrastructure so you can access and see whateve you need (all 
hosts, DB, etc...).


Right now the machines that keep running work, but once shut down they 
start showing the problem below...


Thank you

El 2018-06-18 15:20, Benny Zlotnik escribió:

I'm having trouble following the errors, I think the SPM changed or
the vdsm log from the right host might be missing.

However, I believe what started the problems is this transaction
timeout:

2018-06-15 14:20:51,378+01 ERROR
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-29)
[1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction for
action type RemoveSnapshotSingleDisk threw an exception.:
org.springframework.jdbc.CannotGetJdbcConnectionException: Could not
get JDBC Connection; nested exception is java.sql.SQLException:
javax.resource.ResourceException: IJ000460: Error checking for a
transaction
 at
org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
[spring-jdbc.jar:4.2.4.RELEASE]
 at
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
[spring-jdbc.jar:4.2.4.RELEASE]
 at
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
[spring-jdbc.jar:4.2.4.RELEASE]
 at
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
[spring-jdbc.jar:4.2.4.RELEASE]
 at
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
[spring-jdbc.jar:4.2.4.RELEASE]
 at
org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)
[dal.jar:]

This looks like a bug

Regardless, I am not sure restoring a backup would help since you
probably have orphaned images on the storage which need to be removed

Adding Ala

On Mon, Jun 18, 2018 at 4:19 PM,  wrote:


Hi Benny,

Please find the SPM logs at [1].

Thank you

  [1]:


https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee

[1]

El 2018-06-18 13:19, Benny Zlotnik escribió:
Can you send the SPM logs as well?

On Mon, Jun 18, 2018 at 1:13 PM,  wrote:

Hi Benny,

Please find the logs at [1].

Thank you.

  [1]:



https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d

[2]
[1]

El 2018-06-18 09:28, Benny Zlotnik escribió:

Can you provide full engine and vdsm logs?

On Mon, Jun 18, 2018 at 11:20 AM,  wrote:

Hi,

We're running oVirt 4.1.9 (we cannot upgrade at this time) and
we're having a major problem in our infrastructure. On friday, a
snapshots were automatically created on more than 200 VMs and as
this was just a test task, all of them were deleted at the same
time, which seems to have corrupted several VMs.

When trying to delete a snapshot on some of the VMs, a "General
error" is thrown with a NullPointerException in the engine log
(attached).

But the worst part is that when some of these machines is powered
off and then powered on, the VMs are corrupt...

VM myvm is down with error. Exit message: Bad volume specification
{u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
{u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
'23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
u'discard': False, u'specParams': {}, u'readonly': u'false',
u'iface': u'virtio', u'optional': u'false', u'deviceId':
u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
u'disk'}.

We're really frustrated by now and don't know how to procceed... We
have a DB backup (with engine-backup) from thursday which would
have
a "sane" DB definition without all the snapshots, as they were all
created on friday. Would it be safe to restore this backup?

Any help is really appreciated...

Thanks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/ [3]
[2]
[1]
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/ [4] [3]
[2]
List Archives:



https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

[5]
[4]
[3]

Links:
--
[1] https://www.ovirt.org/site/privacy-policy/ [3] [2]
[2] https://www.ovirt.org/community/about/community-guidelines/ [4]
[3]
[3]




[ovirt-users] Re: General failure

2018-06-18 Thread Benny Zlotnik
I'm having trouble following the errors, I think the SPM changed or the
vdsm log from the right host might be missing.

However, I believe what started the problems is this transaction timeout:
2018-06-15 14:20:51,378+01 ERROR
[org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
(org.ovirt.thread.pool-6-thread-29) [1db468cb-85fd-4189-b356-d31781461504]
[within thread]: endAction for action type RemoveSnapshotSingleDisk threw
an exception.: org.springframework.jdbc.CannotGetJdbcConnectionException:
Could not get JDBC Connection; nested exception is java.sql.SQLException:
javax.resource.ResourceException: IJ000460: Error checking for a transaction
at
org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
[spring-jdbc.jar:4.2.4.RELEASE]
at
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
[spring-jdbc.jar:4.2.4.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
[spring-jdbc.jar:4.2.4.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
[spring-jdbc.jar:4.2.4.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
[spring-jdbc.jar:4.2.4.RELEASE]
at
org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)
[dal.jar:]

This looks like a bug

Regardless, I am not sure restoring a backup would help since you probably
have orphaned images on the storage which need to be removed

Adding Ala

On Mon, Jun 18, 2018 at 4:19 PM,  wrote:

> Hi Benny,
>
> Please find the SPM logs at [1].
>
> Thank you
>
>   [1]: https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b
> 0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee
>
> El 2018-06-18 13:19, Benny Zlotnik escribió:
>
>> Can you send the SPM logs as well?
>>
>> On Mon, Jun 18, 2018 at 1:13 PM,  wrote:
>>
>> Hi Benny,
>>>
>>> Please find the logs at [1].
>>>
>>> Thank you.
>>>
>>>   [1]:
>>>
>>> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
>> 94c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
>>
>>> [1]
>>>
>>>
>>> El 2018-06-18 09:28, Benny Zlotnik escribió:
>>>
>>> Can you provide full engine and vdsm logs?
>>>
>>> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
>>>
>>> Hi,
>>>
>>> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
>>> we're having a major problem in our infrastructure. On friday, a
>>> snapshots were automatically created on more than 200 VMs and as
>>> this was just a test task, all of them were deleted at the same
>>> time, which seems to have corrupted several VMs.
>>>
>>> When trying to delete a snapshot on some of the VMs, a "General
>>> error" is thrown with a NullPointerException in the engine log
>>> (attached).
>>>
>>> But the worst part is that when some of these machines is powered
>>> off and then powered on, the VMs are corrupt...
>>>
>>> VM myvm is down with error. Exit message: Bad volume specification
>>> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
>>> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
>>> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
>>> u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
>>> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
>>> '23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
>>> u'discard': False, u'specParams': {}, u'readonly': u'false',
>>> u'iface': u'virtio', u'optional': u'false', u'deviceId':
>>> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
>>> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
>>> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
>>> u'disk'}.
>>>
>>> We're really frustrated by now and don't know how to procceed... We
>>> have a DB backup (with engine-backup) from thursday which would
>>> have
>>> a "sane" DB definition without all the snapshots, as they were all
>>> created on friday. Would it be safe to restore this backup?
>>>
>>> Any help is really appreciated...
>>>
>>> Thanks.
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ [2]
>>> [1]
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/ [3] [2]
>>> List Archives:
>>>
>>>
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/messag
>> e/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/
>>
>>> [4]
>>> [3]
>>>
>>> Links:
>>> --
>>> [1] https://www.ovirt.org/site/privacy-policy/ [2]
>>> [2] https://www.ovirt.org/community/about/community-guidelines/ [3]
>>> [3]
>>>
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/messag
>> e/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/
>>
>>> [4]
>>>
>>
>>
>>
>> Links:
>> --
>> [1]
>> https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
>> 

[ovirt-users] Re: General failure

2018-06-18 Thread nicolas

Hi Benny,

Please find the SPM logs at [1].

Thank you

  [1]: 
https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618131825/036b7782f58d337baf909a7220d8455320180618131825/5550ee


El 2018-06-18 13:19, Benny Zlotnik escribió:

Can you send the SPM logs as well?

On Mon, Jun 18, 2018 at 1:13 PM,  wrote:


Hi Benny,

Please find the logs at [1].

Thank you.

  [1]:


https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d

[1]

El 2018-06-18 09:28, Benny Zlotnik escribió:

Can you provide full engine and vdsm logs?

On Mon, Jun 18, 2018 at 11:20 AM,  wrote:

Hi,

We're running oVirt 4.1.9 (we cannot upgrade at this time) and
we're having a major problem in our infrastructure. On friday, a
snapshots were automatically created on more than 200 VMs and as
this was just a test task, all of them were deleted at the same
time, which seems to have corrupted several VMs.

When trying to delete a snapshot on some of the VMs, a "General
error" is thrown with a NullPointerException in the engine log
(attached).

But the worst part is that when some of these machines is powered
off and then powered on, the VMs are corrupt...

VM myvm is down with error. Exit message: Bad volume specification
{u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
{u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
'23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
u'discard': False, u'specParams': {}, u'readonly': u'false',
u'iface': u'virtio', u'optional': u'false', u'deviceId':
u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
u'disk'}.

We're really frustrated by now and don't know how to procceed... We
have a DB backup (with engine-backup) from thursday which would
have
a "sane" DB definition without all the snapshots, as they were all
created on friday. Would it be safe to restore this backup?

Any help is really appreciated...

Thanks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/ [2]
[1]
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/ [3] [2]
List Archives:



https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

[4]
[3]

Links:
--
[1] https://www.ovirt.org/site/privacy-policy/ [2]
[2] https://www.ovirt.org/community/about/community-guidelines/ [3]
[3]


https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

[4]




Links:
--
[1]
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
[2] https://www.ovirt.org/site/privacy-policy/
[3] https://www.ovirt.org/community/about/community-guidelines/
[4]
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PJOVP7GFNSDCQWTEATIGOUYCVUQXIU6H/


[ovirt-users] Re: General failure

2018-06-18 Thread Benny Zlotnik
Can you send the SPM logs as well?

On Mon, Jun 18, 2018 at 1:13 PM,  wrote:

> Hi Benny,
>
> Please find the logs at [1].
>
> Thank you.
>
>   [1]: https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af1
> 94c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d
>
>
> El 2018-06-18 09:28, Benny Zlotnik escribió:
>
>> Can you provide full engine and vdsm logs?
>>
>> On Mon, Jun 18, 2018 at 11:20 AM,  wrote:
>>
>> Hi,
>>>
>>> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
>>> we're having a major problem in our infrastructure. On friday, a
>>> snapshots were automatically created on more than 200 VMs and as
>>> this was just a test task, all of them were deleted at the same
>>> time, which seems to have corrupted several VMs.
>>>
>>> When trying to delete a snapshot on some of the VMs, a "General
>>> error" is thrown with a NullPointerException in the engine log
>>> (attached).
>>>
>>> But the worst part is that when some of these machines is powered
>>> off and then powered on, the VMs are corrupt...
>>>
>>> VM myvm is down with error. Exit message: Bad volume specification
>>> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
>>> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
>>> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
>>> u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
>>> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
>>> '23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
>>> u'discard': False, u'specParams': {}, u'readonly': u'false',
>>> u'iface': u'virtio', u'optional': u'false', u'deviceId':
>>> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
>>> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
>>> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
>>> u'disk'}.
>>>
>>> We're really frustrated by now and don't know how to procceed... We
>>> have a DB backup (with engine-backup) from thursday which would have
>>> a "sane" DB definition without all the snapshots, as they were all
>>> created on friday. Would it be safe to restore this backup?
>>>
>>> Any help is really appreciated...
>>>
>>> Thanks.
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ [1]
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/ [2]
>>> List Archives:
>>>
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/messag
>> e/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/
>>
>>> [3]
>>>
>>
>>
>>
>> Links:
>> --
>> [1] https://www.ovirt.org/site/privacy-policy/
>> [2] https://www.ovirt.org/community/about/community-guidelines/
>> [3]
>> https://lists.ovirt.org/archives/list/users@ovirt.org/messag
>> e/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/
>>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3MWI6UHSVYXAI5OWOQWSOC7N7ZGOZ2VA/


[ovirt-users] Re: General failure

2018-06-18 Thread nicolas

Hi Benny,

Please find the logs at [1].

Thank you.

  [1]: 
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618101223/647c066b7b91096570def304da86dbca20180618101223/583d3d


El 2018-06-18 09:28, Benny Zlotnik escribió:

Can you provide full engine and vdsm logs?

On Mon, Jun 18, 2018 at 11:20 AM,  wrote:


Hi,

We're running oVirt 4.1.9 (we cannot upgrade at this time) and
we're having a major problem in our infrastructure. On friday, a
snapshots were automatically created on more than 200 VMs and as
this was just a test task, all of them were deleted at the same
time, which seems to have corrupted several VMs.

When trying to delete a snapshot on some of the VMs, a "General
error" is thrown with a NullPointerException in the engine log
(attached).

But the worst part is that when some of these machines is powered
off and then powered on, the VMs are corrupt...

VM myvm is down with error. Exit message: Bad volume specification
{u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
{u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x',
u'type': u'pci', u'slot': u'0x06'}, u'volumeID':
u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
'23622320128', u'imageID': u'65519220-68e1-462a-99b3-f0763c78eae2',
u'discard': False, u'specParams': {}, u'readonly': u'false',
u'iface': u'virtio', u'optional': u'false', u'deviceId':
u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device':
u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type':
u'disk'}.

We're really frustrated by now and don't know how to procceed... We
have a DB backup (with engine-backup) from thursday which would have
a "sane" DB definition without all the snapshots, as they were all
created on friday. Would it be safe to restore this backup?

Any help is really appreciated...

Thanks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/ [1]
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/ [2]
List Archives:


https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

[3]




Links:
--
[1] https://www.ovirt.org/site/privacy-policy/
[2] https://www.ovirt.org/community/about/community-guidelines/
[3]
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DDDBLNYMIJV222MK4QS6UPALQ7WRA6M3/


[ovirt-users] Re: General failure

2018-06-18 Thread Benny Zlotnik
Can you provide full engine and vdsm logs?

On Mon, Jun 18, 2018 at 11:20 AM,  wrote:

> Hi,
>
> We're running oVirt 4.1.9 (we cannot upgrade at this time) and we're
> having a major problem in our infrastructure. On friday, a snapshots were
> automatically created on more than 200 VMs and as this was just a test
> task, all of them were deleted at the same time, which seems to have
> corrupted several VMs.
>
> When trying to delete a snapshot on some of the VMs, a "General error" is
> thrown with a NullPointerException in the engine log (attached).
>
> But the worst part is that when some of these machines is powered off and
> then powered on, the VMs are corrupt...
>
> VM myvm is down with error. Exit message: Bad volume specification
> {u'index': 0, u'domainID': u'110ea376-d789-40a1-b9f6-6b40c31afe01',
> 'reqsize': '0', u'format': u'cow', u'bootOrder': u'1', u'address':
> {u'function': u'0x0', u'bus': u'0x00', u'domain': u'0x', u'type':
> u'pci', u'slot': u'0x06'}, u'volumeID': 
> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290',
> 'apparentsize': '23622320128', u'imageID': 
> u'65519220-68e1-462a-99b3-f0763c78eae2',
> u'discard': False, u'specParams': {}, u'readonly': u'false', u'iface':
> u'virtio', u'optional': u'false', u'deviceId':
> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize': '23622320128',
> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63', u'device': u'disk',
> u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}.
>
> We're really frustrated by now and don't know how to procceed... We have a
> DB backup (with engine-backup) from thursday which would have a "sane" DB
> definition without all the snapshots, as they were all created on friday.
> Would it be safe to restore this backup?
>
> Any help is really appreciated...
>
> Thanks.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-
> guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/
> message/P5OOGBL3BRZIQ2I46FYELBUIIWT5QK4C/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7SLIUR6JDE4RHLEK72Y3NF6JFWXBW4PZ/