To investigate further , can you pull out job related to this volume(one which in expunging state ) from async job table ,and also check storage cleanup thread in log ,why it was not able to clean this volume.
~prashant On 1/25/15, 3:04 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote: >On Sun, Jan 25, 2015 at 6:33 AM, Keerthiraja SJ <sjkeer...@gmail.com> >wrote: >> its my production environment. I am facing critical issue when if any >> xenserver reboots or if I try to patch up the xenserver VM's will not >> start. In that case I have to create new instances and do all the setup >> from the scratch, So the downtime will be more than 5 hours. >> >> Is there a way we can resolve this issue. > >I would guess the following would help: > >UPDATE volumes set state = 'Expunged', removed = '2015-01-08 >20:40:00' WHERE instance_id='1381' AND state = 'Expunging'; > >Note that somehow some call to the backend didn't return and some >volume dbeb4cf4-0673-48be-b474-ebc5a73d08c7 might still exist >somewhere (in some primary storage). give it some rest to settle after >issuing the statement and then check if it helpen by rebooting the vm. > >> >> Thanks, >> Keerthi >> >> On Sat, Jan 24, 2015 at 6:09 PM, Daan Hoogland <daan.hoogl...@gmail.com> >> wrote: >> >>> see inline >>> >>> On Sat, Jan 24, 2015 at 6:00 AM, Keerthiraja SJ <sjkeer...@gmail.com> >>> wrote: >>> > Below are the query ran for the instances which have many root disk. >>> > >>> > mysql> select * from volumes where instance_id='1381'; >>> > >>> >>>+------+------------+-----------+---------+--------------+-------------+ >>>-----------+-----------+--------------------------------------+--------- >>>----+-------------------------+--------------------------------------+-- >>>------+----------------+------------+---------+-------------+----------- >>>+------------------+-------------+----------------------------+--------- >>>----+---------------------+----------+---------------------+---------+-- >>>---------+------------+--------------+-----------+---------------------- >>>--+--------+----------------+--------+----------+----------+ >>> > | id | account_id | domain_id | pool_id | last_pool_id | >>>instance_id | >>> > device_id | name | uuid | size >>> > | folder | path | >>> pod_id | >>> > data_center_id | iscsi_name | host_ip | volume_type | pool_type | >>> > disk_offering_id | template_id | first_snapshot_backup_uuid | >>> recreatable | >>> > created | attached | updated | removed | >>>state >>> > | chain_info | update_count | disk_type | vm_snapshot_chain_size | >>> iso_id | >>> > display_volume | format | min_iops | max_iops | >>> > >>> >>>+------+------------+-----------+---------+--------------+-------------+ >>>-----------+-----------+--------------------------------------+--------- >>>----+-------------------------+--------------------------------------+-- >>>------+----------------+------------+---------+-------------+----------- >>>+------------------+-------------+----------------------------+--------- >>>----+---------------------+----------+---------------------+---------+-- >>>---------+------------+--------------+-----------+---------------------- >>>--+--------+----------------+--------+----------+----------+ >>> > | 1572 | 2 | 1 | 249 | NULL | >>>1381 >>> > | 0 | ROOT-1381 | 9ee67199-7af8-4a47-8886-016cfff26b5c | >>> > 21474836480 | NULL | >>> > 06a3b4c9-6cfe-45c8-8640-7f562b11e2cb | NULL | 5 | NULL >>> > | NULL | ROOT | NULL | 125 | 347 >>>| >>> > NULL | 0 | 2015-01-08 14:19:27 | NULL >>> | >>> > 2015-01-08 20:39:59 | NULL | Ready | NULL | >>>4 | >>> > NULL | NULL | NULL | 0 | VHD >>> | >>> > NULL | NULL | >>> > | 1575 | 2 | 1 | 259 | 249 | >>>1381 >>> > | 0 | ROOT-1381 | dbeb4cf4-0673-48be-b474-ebc5a73d08c7 | >>> > 21474836480 | /vol/cldswp_orprimary03 | >>> > 06a3b4c9-6cfe-45c8-8640-7f562b11e2cb | 5 | 5 | NULL >>> > | NULL | ROOT | NULL | 125 | 347 >>>| >>> > NULL | 0 | 2015-01-08 20:25:29 | NULL >>> | >>> > 2015-01-08 20:39:59 | NULL | Expunging | NULL | >>>6 | >>> >>> here is the problem; the 'Expunging' value indicates that it is in >>> transit from marked for deletion to actual deletion. I am not sure >>> what caused this but it is not good. The field should contain the >>> value 'Expunged' and the field before should contain the time-stamp of >>> the moment this happened. >>> >>> If this is production, investigate before following my advice blindly. >>> It is code analysis that brought me to this conclusion, not >>> experience. >>> >>> > NULL | NULL | NULL | 0 | VHD >>> | >>> > NULL | NULL | >>> > >>> >>>+------+------------+-----------+---------+--------------+-------------+ >>>-----------+-----------+--------------------------------------+--------- >>>----+-------------------------+--------------------------------------+-- >>>------+----------------+------------+---------+-------------+----------- >>>+------------------+-------------+----------------------------+--------- >>>----+---------------------+----------+---------------------+---------+-- >>>---------+------------+--------------+-----------+---------------------- >>>--+--------+----------------+--------+----------+----------+ >>> > 2 rows in set (0.00 sec) >>> > >>> > >>> > >>> > >>> > On Fri, Jan 23, 2015 at 5:00 PM, Prashant Kumar Mishra < >>> > prashantkumar.mis...@citrix.com> wrote: >>> > >>> >> Can you check state of these Disks in DB( volumes table for give >>> >> instance_id). >>> >> >>> >> On 1/23/15, 4:46 PM, "Daan Hoogland" <daan.hoogl...@gmail.com> >>>wrote: >>> >> >>> >> >Keerthi, >>> >> > >>> >> >This doesn't ring a bell directly but to investigate further; >>> >> > >>> >> >Did you check in the database whether these are pointing to the >>>same >>> >> >image? This seems like a data corruption, to me. >>> >> > >>> >> >When you say stop/start the vm, do you mean from >>>cloudstack/xenserver >>> >> >or within the vm? >>> >> > >>> >> >On Fri, Jan 23, 2015 at 5:47 AM, Keerthiraja SJ >>><sjkeer...@gmail.com> >>> >> >wrote: >>> >> >> Anyone can help on this. >>> >> >> >>> >> >> Thanks, >>> >> >> Keerthi >>> >> >> >>> >> >> On Thu, Jan 22, 2015 at 11:46 AM, Keerthiraja SJ < >>> sjkeer...@gmail.com> >>> >> >> wrote: >>> >> >> >>> >> >>> Hi >>> >> >>> >>> >> >>> For one of my Instances I could see many multiple of RootDisk. >>> >> >>> >>> >> >>> Because of this when xenserver reboots or if I stop and start >>>the >>> >> >>> instances my VM is not responding. >>> >> >>> >>> >> >>> How can I fix this issue. >>> >> >>> >>> >> >>> I could see this issue almost out 173 VM's 5 are like this. >>> >> >>> >>> >> >>> *Below are the one pasted from the Storage CS GUI* >>> >> >>> >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServeror004-test.net ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServeror004-test.net >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> ROOT-868ROOTXenServer >>> >> >>> Thanks, >>> >> >>> Keerthi >>> >> >>> >>> >> >>> >>> >> > >>> >> > >>> >> > >>> >> >-- >>> >> >Daan >>> >> >>> >> >>> >>> >>> >>> -- >>> Daan >>> > > > >-- >Daan