[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-25 Thread Jason P. Thomas



On 2/25/19 3:49 AM, Sahina Bose wrote:

On Thu, Feb 21, 2019 at 11:11 PM Jason P. Thomas  wrote:

On 2/20/19 5:33 PM, Darrell Budic wrote:

I was just helping Tristam on #ovirt with a similar problem, we found that his two 
upgraded nodes were running multiple glusterfsd processes per brick (but not all 
bricks). His volume & brick files in /var/lib/gluster looked normal, but 
starting glusterd would often spawn extra fsd processes per brick, seemed random. 
Gluster bug? Maybe related to  
https://secure-web.cisco.com/1zutsPlj0TjKvDiGvxmw5PZZPUoEtpkcJqhpWGvx2-fJoEMzSTBS7hSNqb4ZmBoELYn-OnLEiRZHRpA5XY5mJr_7RfSDll1EaXkWR3M-r-rqGPXA4X2JDx5sZIgvjQ2RJeEYd_rZJAV0FE2Z7aTuyPvla0k6cqyL02QWOj-RjkRXgfrSECActBehc3ggEc95SMQ068Z6cbtEdIb09cLN6qDHuWz-vs9NqstdyJQQZ8n_BDEVOw5MURbMF54EWQto914O2Pq11buymdl_wDrA2SCi31sSraVfhD9r0DTFuNqd8ibxbYbdIeKtg87JFMy3EYj8zsOlZxtsNWlbz4UMo6q0BMTcHGfrQhY_sMTWL0HlKsbonv9m8bZQsoQ7LNrQzfkEZBQgHnyNRE1_97d2ZjfcQAtZJJsIGVN4ddEp82FrJd6zhSXnfrcLiaNDAEUBi/https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1651246,
 but I’m helping debug this one second hand… Possibly related to the brick crashes? 
We wound up stopping glusterd, killing off all the fsds, restarting glusterd, and 
repeating until it only spawned one fsd per brick. Did that to each updated server, 
then restarted glusterd on the not-yet-updated server to get it talking to the 
right bricks. That seemed to get to a mostly stable gluster environment, but he’s 
still seeing 1-2 files listed as needing healing on the upgraded bricks (but not 
the 3.12 brick). Mainly the DIRECT_IO_TEST and one of the dom/ids files, but he can 
probably update that. Did manage to get his engine going again, waiting to see if 
he’s stable now.

Anyway, figured it was worth posting about so people could check for multiple 
brick processes (glusterfsd) if they hit this stability issue as well, maybe 
find common ground.

Note: also encountered 
https://secure-web.cisco.com/1eMDo5MMs_aJ6TOHBckwoG7rBeXsvPoHI01UBZ8YJ2GnI4ds5muAWuLwC7w0uUVSuhnic-rltGTzC_FGPYYigedn-1_Z90J0qkvaSCJNLr_gm2gj0ujAk5G31zF1tv-YHZSUx6PBm2tG8GpplIWiwN5N1TBqfjKEszVer6Yxrtkjl7ZPqnZlTBeGiNKBLa5zz9JFmvAhuXtFodfy6Dh-mIbFQZ1IdeIh6j37hhTfYLAhG_r-tsIFSIc8V9x_q3-PE7OA68lQ9w7dmFteOLHkhND8mvhmI1sWlV0iVg3jx9ll8a4mBSsng4Thf1lJGdBvT_vXxriv8LKHp1vh86r_u3gdbs65hWQKhhc7Z2zJJvh9yXw7lk7qF4n-hIlyIS2p4Yzgtn94ExpgvhQM1q9HVB7-SJ2YMhGJGfrAu3lFFesUMtqRPbY9TTUAXxMREwjth/https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1348434
 trying to get his engine back up, restarting libvirtd let us get it going 
again. Maybe un-needed if he’d been able to complete his third node upgrades, 
but he got stuck before then, so...

   -Darrell

Stable is a relative term.  My unsynced entries total for each of my 4 volumes changes 
drastically (with the exception of the engine volume, it pretty much bounces between 1 
and 4).  The cluster has been "healing" for 18 hours or so and only the 
unupgraded HC node has healed bricks.  I did have the problem that some files/directories 
were owned by root:root.  These VMs did not boot until I changed ownership to 36:36.  
Even after 18 hours, there's anywhere from 20-386 entries in vol heal info for my 3 non 
engine bricks.  Overnight I had one brick on one volume go down on one HC node.  When I 
bounced glusterd, it brought up a new fsd process for that brick.  I killed the old one 
and now vol status reports the right pid on each of the nodes.  This is quite the 
debacle.  If I can provide any info that might help get this debacle moving in the right 
direction, let me know.

Can you provide the gluster brick logs and glusterd logs from the
servers (from /var/log/glusterfs/). Since you mention that heal seems
to be stuck, could you also provide the heal logs from
/var/log/glusterfs/glustershd.log
If you can log a bug with these logs, that would be great - please use
https://secure-web.cisco.com/1u1JXUFgLqfmecw2r1a0ZiPWiR0AlgZ1-8A3Ax1bpyCysNvH6JpfsdOiPfzxYx9Y8wB0zTlGHTNbWzR2Qf_Y3ElsXtJKdRfYllHpXeCSMSHQKq-jQzK83503i24c4jYADgwiPhWVUuk-3K9nVr_NTDrVKPu8KShG2UH9sFakcBsTMC6xTAaHTLJJxHH_PYCmjG9XECovJOX7_LYjOIJOd-npFm_fUlWtTwXick9NaOggfdhESof7KhbdleZWP36--TaXZ3a9c26rFkogI0kVuv-8Eex7blbGFBuoCyqabs0W1fczoluYJ30xx5kAMqM5tSb36QQHRtQbhEPFD1_x9uUqwE3CFm_W7towsYiSqeY3J32aAGIliiddohLhr5a8IpQYaeHHd6Wwp0RZun_zucnEtEq7uZY8QMMMrZLDh6iA0YLntokb0xRhh_dodk8dZ/https%3A%2F%2Fbugzilla.redhat.com%2Fenter_bug.cgi%3Fproduct%3DGlusterFS
 to log the
bug.
I've filed Bug 1682925 
 with the requested 
logs during the time frame I experienced issues.  Sorry for the delay, I 
was out of the office Friday and this morning.


Jason




Jason aka Tristam


On Feb 14, 2019, at 1:12 AM, Sahina Bose  wrote:

On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:




Can you be more specific? What things did you see, and did you report bugs?


I've got this one: 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-25 Thread Sahina Bose
On Thu, Feb 21, 2019 at 11:11 PM Jason P. Thomas  wrote:
>
> On 2/20/19 5:33 PM, Darrell Budic wrote:
>
> I was just helping Tristam on #ovirt with a similar problem, we found that 
> his two upgraded nodes were running multiple glusterfsd processes per brick 
> (but not all bricks). His volume & brick files in /var/lib/gluster looked 
> normal, but starting glusterd would often spawn extra fsd processes per 
> brick, seemed random. Gluster bug? Maybe related to  
> https://bugzilla.redhat.com/show_bug.cgi?id=1651246, but I’m helping debug 
> this one second hand… Possibly related to the brick crashes? We wound up 
> stopping glusterd, killing off all the fsds, restarting glusterd, and 
> repeating until it only spawned one fsd per brick. Did that to each updated 
> server, then restarted glusterd on the not-yet-updated server to get it 
> talking to the right bricks. That seemed to get to a mostly stable gluster 
> environment, but he’s still seeing 1-2 files listed as needing healing on the 
> upgraded bricks (but not the 3.12 brick). Mainly the DIRECT_IO_TEST and one 
> of the dom/ids files, but he can probably update that. Did manage to get his 
> engine going again, waiting to see if he’s stable now.
>
> Anyway, figured it was worth posting about so people could check for multiple 
> brick processes (glusterfsd) if they hit this stability issue as well, maybe 
> find common ground.
>
> Note: also encountered https://bugzilla.redhat.com/show_bug.cgi?id=1348434 
> trying to get his engine back up, restarting libvirtd let us get it going 
> again. Maybe un-needed if he’d been able to complete his third node upgrades, 
> but he got stuck before then, so...
>
>   -Darrell
>
> Stable is a relative term.  My unsynced entries total for each of my 4 
> volumes changes drastically (with the exception of the engine volume, it 
> pretty much bounces between 1 and 4).  The cluster has been "healing" for 18 
> hours or so and only the unupgraded HC node has healed bricks.  I did have 
> the problem that some files/directories were owned by root:root.  These VMs 
> did not boot until I changed ownership to 36:36.  Even after 18 hours, 
> there's anywhere from 20-386 entries in vol heal info for my 3 non engine 
> bricks.  Overnight I had one brick on one volume go down on one HC node.  
> When I bounced glusterd, it brought up a new fsd process for that brick.  I 
> killed the old one and now vol status reports the right pid on each of the 
> nodes.  This is quite the debacle.  If I can provide any info that might help 
> get this debacle moving in the right direction, let me know.

Can you provide the gluster brick logs and glusterd logs from the
servers (from /var/log/glusterfs/). Since you mention that heal seems
to be stuck, could you also provide the heal logs from
/var/log/glusterfs/glustershd.log
If you can log a bug with these logs, that would be great - please use
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS to log the
bug.


>
> Jason aka Tristam
>
>
> On Feb 14, 2019, at 1:12 AM, Sahina Bose  wrote:
>
> On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:
>
>
>
>
> Can you be more specific? What things did you see, and did you report bugs?
>
>
> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> and I've got bricks randomly going offline and getting out of sync with the 
> others at which point I've had to manually stop and start the volume to get 
> things back in sync.
>
>
> Thanks for reporting these. Will follow up on the bugs to ensure
> they're addressed.
> Regarding brciks going offline - are the brick processes crashing? Can
> you provide logs of glusterd and bricks. Or is this to do with
> ovirt-engine and brick status not being in sync?
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V4H7Q2RKS2C4ZUMWDODQY6/
>
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: 
> 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-21 Thread Greg Sheremeta
On Thu, Feb 21, 2019 at 12:42 PM Jason P. Thomas 
wrote:

> On 2/20/19 5:33 PM, Darrell Budic wrote:
>
> I was just helping Tristam on #ovirt with a similar problem, we found that
> his two upgraded nodes were running multiple glusterfsd processes per brick
> (but not all bricks). His volume & brick files in /var/lib/gluster looked
> normal, but starting glusterd would often spawn extra fsd processes per
> brick, seemed random. Gluster bug? Maybe related to
> https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> ,
> but I’m helping debug this one second hand… Possibly related to the brick
> crashes? We wound up stopping glusterd, killing off all the fsds,
> restarting glusterd, and repeating until it only spawned one fsd per brick.
> Did that to each updated server, then restarted glusterd on the
> not-yet-updated server to get it talking to the right bricks. That seemed
> to get to a mostly stable gluster environment, but he’s still seeing 1-2
> files listed as needing healing on the upgraded bricks (but not the 3.12
> brick). Mainly the DIRECT_IO_TEST and one of the dom/ids files, but he can
> probably update that. Did manage to get his engine going again, waiting to
> see if he’s stable now.
>
> Anyway, figured it was worth posting about so people could check for
> multiple brick processes (glusterfsd) if they hit this stability issue as
> well, maybe find common ground.
>
> Note: also encountered https://bugzilla.redhat.com/show_bug.cgi?id=1348434
> 
>  trying
> to get his engine back up, restarting libvirtd let us get it going again.
> Maybe un-needed if he’d been able to complete his third node upgrades, but
> he got stuck before then, so...
>
>   -Darrell
>
> Stable is a relative term.  My unsynced entries total for each of my 4
> volumes changes drastically (with the exception of the engine volume, it
> pretty much bounces between 1 and 4).  The cluster has been "healing" for
> 18 hours or so and only the unupgraded HC node has healed bricks.  I did
> have the problem that some files/directories were owned by root:root.
> These VMs did not boot until I changed ownership to 36:36.  Even after 18
> hours, there's anywhere from 20-386 entries in vol heal info for my 3 non
> engine bricks.  Overnight I had one brick on one volume go down on one HC
> node.  When I bounced glusterd, it brought up a new fsd process for that
> brick.  I killed the old one and now vol status reports the right pid on
> each of the nodes.  This is quite the debacle.  If I can provide any info
> that might help get this debacle moving in the right direction, let me know.
>

+Sahina Bose 


>
> Jason aka Tristam
>
>
> On Feb 14, 2019, at 1:12 AM, Sahina Bose  wrote:
>
> On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:
>
>
>
>
> Can you be more specific? What things did you see, and did you report bugs?
>
>
> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
> 
> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-21 Thread Jason P. Thomas

On 2/20/19 5:33 PM, Darrell Budic wrote:
I was just helping Tristam on #ovirt with a similar problem, we found 
that his two upgraded nodes were running multiple glusterfsd processes 
per brick (but not all bricks). His volume & brick files in 
/var/lib/gluster looked normal, but starting glusterd would often 
spawn extra fsd processes per brick, seemed random. Gluster bug? Maybe 
related to https://bugzilla.redhat.com/show_bug.cgi?id=1651246 
, 
but I’m helping debug this one second hand… Possibly related to the 
brick crashes? We wound up stopping glusterd, killing off all the 
fsds, restarting glusterd, and repeating until it only spawned one fsd 
per brick. Did that to each updated server, then restarted glusterd on 
the not-yet-updated server to get it talking to the right bricks. That 
seemed to get to a mostly stable gluster environment, but he’s still 
seeing 1-2 files listed as needing healing on the upgraded bricks (but 
not the 3.12 brick). Mainly the DIRECT_IO_TEST and one of the dom/ids 
files, but he can probably update that. Did manage to get his engine 
going again, waiting to see if he’s stable now.


Anyway, figured it was worth posting about so people could check for 
multiple brick processes (glusterfsd) if they hit this stability issue 
as well, maybe find common ground.


Note: also encountered 
https://bugzilla.redhat.com/show_bug.cgi?id=1348434 
 trying 
to get his engine back up, restarting libvirtd let us get it going 
again. Maybe un-needed if he’d been able to complete his third node 
upgrades, but he got stuck before then, so...


  -Darrell

Stable is a relative term.  My unsynced entries total for each of my 4 
volumes changes drastically (with the exception of the engine volume, it 
pretty much bounces between 1 and 4).  The cluster has been "healing" 
for 18 hours or so and only the unupgraded HC node has healed bricks.  I 
did have the problem that some files/directories were owned by 
root:root.  These VMs did not boot until I changed ownership to 36:36.  
Even after 18 hours, there's anywhere from 20-386 entries in vol heal 
info for my 3 non engine bricks.  Overnight I had one brick on one 
volume go down on one HC node.  When I bounced glusterd, it brought up a 
new fsd process for that brick.  I killed the old one and now vol status 
reports the right pid on each of the nodes.  This is quite the debacle.  
If I can provide any info that might help get this debacle moving in the 
right direction, let me know.


Jason aka Tristam


On Feb 14, 2019, at 1:12 AM, Sahina Bose > wrote:


On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome > wrote:





Can you be more specific? What things did you see, and did you 
report bugs?


I've got this one: 
https://bugzilla.redhat.com/show_bug.cgi?id=1649054 

and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-20 Thread Darrell Budic
I was just helping Tristam on #ovirt with a similar problem, we found that his 
two upgraded nodes were running multiple glusterfsd processes per brick (but 
not all bricks). His volume & brick files in /var/lib/gluster looked normal, 
but starting glusterd would often spawn extra fsd processes per brick, seemed 
random. Gluster bug? Maybe related to  
https://bugzilla.redhat.com/show_bug.cgi?id=1651246 
, but I’m helping debug 
this one second hand… Possibly related to the brick crashes? We wound up 
stopping glusterd, killing off all the fsds, restarting glusterd, and repeating 
until it only spawned one fsd per brick. Did that to each updated server, then 
restarted glusterd on the not-yet-updated server to get it talking to the right 
bricks. That seemed to get to a mostly stable gluster environment, but he’s 
still seeing 1-2 files listed as needing healing on the upgraded bricks (but 
not the 3.12 brick). Mainly the DIRECT_IO_TEST and one of the dom/ids files, 
but he can probably update that. Did manage to get his engine going again, 
waiting to see if he’s stable now.

Anyway, figured it was worth posting about so people could check for multiple 
brick processes (glusterfsd) if they hit this stability issue as well, maybe 
find common ground.

Note: also encountered https://bugzilla.redhat.com/show_bug.cgi?id=1348434 
 trying to get his engine 
back up, restarting libvirtd let us get it going again. Maybe un-needed if he’d 
been able to complete his third node upgrades, but he got stuck before then, 
so...

  -Darrell

> On Feb 14, 2019, at 1:12 AM, Sahina Bose  wrote:
> 
> On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:
>> 
>> 
>>> 
>>> Can you be more specific? What things did you see, and did you report bugs?
>> 
>> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
>> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
>> and I've got bricks randomly going offline and getting out of sync with the 
>> others at which point I've had to manually stop and start the volume to get 
>> things back in sync.
> 
> Thanks for reporting these. Will follow up on the bugs to ensure
> they're addressed.
> Regarding brciks going offline - are the brick processes crashing? Can
> you provide logs of glusterd and bricks. Or is this to do with
> ovirt-engine and brick status not being in sync?
> 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V4H7Q2RKS2C4ZUMWDODQY6/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DYFZAC4BPJNGZP3PEZ6ZP2AB3C3JVAFM/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Sahina Bose
On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:
>
>
> >
> > Can you be more specific? What things did you see, and did you report bugs?
>
> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> and I've got bricks randomly going offline and getting out of sync with the 
> others at which point I've had to manually stop and start the volume to get 
> things back in sync.

Thanks for reporting these. Will follow up on the bugs to ensure
they're addressed.
Regarding brciks going offline - are the brick processes crashing? Can
you provide logs of glusterd and bricks. Or is this to do with
ovirt-engine and brick status not being in sync?

> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V4H7Q2RKS2C4ZUMWDODQY6/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
Ron, well it looks like you're not wrong.  Less than 24 hours after
upgrading my cluster I have a Gluster brick down...

On Wed, Feb 13, 2019 at 5:58 PM Jayme  wrote:

> Ron, sorry to hear about the troubles.  I haven't seen any gluster crashes
> yet *knock on wood*.  I will monitor closely.  Thanks for the heads up!
>
> On Wed, Feb 13, 2019 at 5:09 PM Ron Jerome  wrote:
>
>>
>> >
>> > Can you be more specific? What things did you see, and did you report
>> bugs?
>>
>> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
>> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
>> and I've got bricks randomly going offline and getting out of sync with
>> the others at which point I've had to manually stop and start the volume to
>> get things back in sync.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
>>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/45PJ4Z5NCHLR7SJCJZNLFZBJWBERAXWV/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
Ron, sorry to hear about the troubles.  I haven't seen any gluster crashes
yet *knock on wood*.  I will monitor closely.  Thanks for the heads up!

On Wed, Feb 13, 2019 at 5:09 PM Ron Jerome  wrote:

>
> >
> > Can you be more specific? What things did you see, and did you report
> bugs?
>
> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
> and I've got bricks randomly going offline and getting out of sync with
> the others at which point I've had to manually stop and start the volume to
> get things back in sync.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VTOY6J4CEMQPPHCGCKKSX73QQSNGMKI5/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Ron Jerome

> 
> Can you be more specific? What things did you see, and did you report bugs?

I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054 
and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246  
and I've got bricks randomly going offline and getting out of sync with the 
others at which point I've had to manually stop and start the volume to get 
things back in sync.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Greg Sheremeta
On Wed, Feb 13, 2019 at 3:06 PM Ron Jerome  wrote:

> > I can confirm that this worked.  I had to shut down every single VM then
> > change ownership to vdsm:kvm of the image file then start VM back up.
> >
> Not to rain on your parade, but you should keep a close eye on your
> gluster file system after the upgrade.  The stability of my gluster file
> system was markedly decreased after the upgrade to gluster 5.3  :-(
>

Can you be more specific? What things did you see, and did you report bugs?
Thanks!

Best wishes,
Greg


> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3Q55MP7XZJJSE3LWBYHIHWRBPUD4BR5J/
>


-- 

GREG SHEREMETA

SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX

Red Hat NA



gsher...@redhat.comIRC: gshereme

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HXKYILEYKO5GHEOASO6PVEWN74ULDUSY/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Ron Jerome
> I can confirm that this worked.  I had to shut down every single VM then
> change ownership to vdsm:kvm of the image file then start VM back up.
> 
Not to rain on your parade, but you should keep a close eye on your gluster 
file system after the upgrade.  The stability of my gluster file system was 
markedly decreased after the upgrade to gluster 5.3  :-(
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3Q55MP7XZJJSE3LWBYHIHWRBPUD4BR5J/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
I can confirm that this worked.  I had to shut down every single VM then
change ownership to vdsm:kvm of the image file then start VM back up.

On Wed, Feb 13, 2019 at 3:08 PM Simone Tiraboschi 
wrote:

>
>
> On Wed, Feb 13, 2019 at 8:06 PM Jayme  wrote:
>
>>
>> I might be hitting this bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795
>>
>
> Yes, you definitively are.
> Fixing files ownership on file system side is a valid workaround.
>
>
>>
>> On Wed, Feb 13, 2019 at 1:35 PM Jayme  wrote:
>>
>>> This may be happening because I changed cluster compatibility to 4.3
>>> then immediately after changed data center compatibility to 4.3 (before
>>> restarting VMs after cluster compatibility change).  If this is the case I
>>> can't fix by downgrading the data center compatibility to 4.2 as it won't
>>> allow me to do so.  What can I do to fix this, any VM I restart will break
>>> (I am leaving the others running for now, but there are some down that I
>>> can't start).
>>>
>>> Full error from VDSM:
>>>
>>> 2019-02-13 13:30:55,465-0400 ERROR (vm/d070ce80)
>>> [storage.TaskManager.Task] (Task='d5c8e50a-0a6f-4fe7-be79-fd322b273a1e')
>>> Unexpected error (task:875)
>>> Traceback (most recent call last):
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line
>>> 882, in _run
>>> return fn(*args, **kargs)
>>>   File "", line 2, in prepareImage
>>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50,
>>> in method
>>> ret = func(*args, **kwargs)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
>>> 3198, in prepareImage
>>> legality = dom.produceVolume(imgUUID, volUUID).getLegality()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 818,
>>> in produceVolume
>>> volUUID)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py",
>>> line 45, in __init__
>>> volUUID)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>>> 800, in __init__
>>> self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID,
>>> volUUID)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py",
>>> line 71, in __init__
>>> volUUID)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>>> 86, in __init__
>>> self.validate()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>>> 112, in validate
>>> self.validateVolumePath()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py",
>>> line 131, in validateVolumePath
>>> raise se.VolumeDoesNotExist(self.volUUID)
>>> VolumeDoesNotExist: Volume does not exist:
>>> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',)
>>> 2019-02-13 13:30:55,468-0400 ERROR (vm/d070ce80) [storage.Dispatcher]
>>> FINISH prepareImage error=Volume does not exist:
>>> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',) (dispatcher:81)
>>> 2019-02-13 13:30:55,469-0400 ERROR (vm/d070ce80) [virt.vm]
>>> (vmId='d070ce80-e0bc-489d-8ee0-47d5926d5ae2') The vm start process failed
>>> (vm:937)
>>> Traceback (most recent call last):
>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in
>>> _startUnderlyingVm
>>> self._run()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2749, in
>>> _run
>>> self._devices = self._make_devices()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2589, in
>>> _make_devices
>>> disk_objs = self._perform_host_local_adjustment()
>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2662, in
>>> _perform_host_local_adjustment
>>> self._preparePathsForDrives(disk_params)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1011, in
>>> _preparePathsForDrives
>>> drive['path'] = self.cif.prepareVolumePath(drive, self.id)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in
>>> prepareVolumePath
>>> raise vm.VolumeError(drive)
>>> VolumeError: Bad volume specification {'address': {'function': '0x0',
>>> 'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'},
>>> 'serial': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface':
>>> 'virtio', 'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
>>> 'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
>>> '64293814272', 'type': 'disk', 'domainID':
>>> '1f2e9989-9ab3-43d5-971d-568b8feca918', 'reqsize': '0', 'format': 'cow',
>>> 'poolID': 'a45e442e-9989-11e8-b0e4-00163e4bf18a', 'device': 'disk', 'path':
>>> '/rhev/data-center/a45e442e-9989-11e8-b0e4-00163e4bf18a/1f2e9989-9ab3-43d5-971d-568b8feca918/images/d81a6826-dc46-44db-8de7-405d30e44d57/2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',
>>> 'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID':
>>> '2d6d5f87-ccb0-48ce-b3ac-84495bd12d32', 'diskType': 'file', 'alias':
>>> 'ua-d81a6826-dc46-44db-8de7-405d30e44d57', 'discard': False}
>>>
>>> On Wed, Feb 13, 2019 at 1:19 PM Jayme  wrote:
>>>
 I may 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Simone Tiraboschi
On Wed, Feb 13, 2019 at 8:06 PM Jayme  wrote:

>
> I might be hitting this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1666795
>

Yes, you definitively are.
Fixing files ownership on file system side is a valid workaround.


>
> On Wed, Feb 13, 2019 at 1:35 PM Jayme  wrote:
>
>> This may be happening because I changed cluster compatibility to 4.3 then
>> immediately after changed data center compatibility to 4.3 (before
>> restarting VMs after cluster compatibility change).  If this is the case I
>> can't fix by downgrading the data center compatibility to 4.2 as it won't
>> allow me to do so.  What can I do to fix this, any VM I restart will break
>> (I am leaving the others running for now, but there are some down that I
>> can't start).
>>
>> Full error from VDSM:
>>
>> 2019-02-13 13:30:55,465-0400 ERROR (vm/d070ce80)
>> [storage.TaskManager.Task] (Task='d5c8e50a-0a6f-4fe7-be79-fd322b273a1e')
>> Unexpected error (task:875)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
>> in _run
>> return fn(*args, **kargs)
>>   File "", line 2, in prepareImage
>>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
>> method
>> ret = func(*args, **kwargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3198,
>> in prepareImage
>> legality = dom.produceVolume(imgUUID, volUUID).getLegality()
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 818,
>> in produceVolume
>> volUUID)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py",
>> line 45, in __init__
>> volUUID)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>> 800, in __init__
>> self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID,
>> volUUID)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py",
>> line 71, in __init__
>> volUUID)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>> 86, in __init__
>> self.validate()
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
>> 112, in validate
>> self.validateVolumePath()
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py",
>> line 131, in validateVolumePath
>> raise se.VolumeDoesNotExist(self.volUUID)
>> VolumeDoesNotExist: Volume does not exist:
>> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',)
>> 2019-02-13 13:30:55,468-0400 ERROR (vm/d070ce80) [storage.Dispatcher]
>> FINISH prepareImage error=Volume does not exist:
>> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',) (dispatcher:81)
>> 2019-02-13 13:30:55,469-0400 ERROR (vm/d070ce80) [virt.vm]
>> (vmId='d070ce80-e0bc-489d-8ee0-47d5926d5ae2') The vm start process failed
>> (vm:937)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in
>> _startUnderlyingVm
>> self._run()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2749, in
>> _run
>> self._devices = self._make_devices()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2589, in
>> _make_devices
>> disk_objs = self._perform_host_local_adjustment()
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2662, in
>> _perform_host_local_adjustment
>> self._preparePathsForDrives(disk_params)
>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1011, in
>> _preparePathsForDrives
>> drive['path'] = self.cif.prepareVolumePath(drive, self.id)
>>   File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in
>> prepareVolumePath
>> raise vm.VolumeError(drive)
>> VolumeError: Bad volume specification {'address': {'function': '0x0',
>> 'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'},
>> 'serial': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface':
>> 'virtio', 'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
>> 'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
>> '64293814272', 'type': 'disk', 'domainID':
>> '1f2e9989-9ab3-43d5-971d-568b8feca918', 'reqsize': '0', 'format': 'cow',
>> 'poolID': 'a45e442e-9989-11e8-b0e4-00163e4bf18a', 'device': 'disk', 'path':
>> '/rhev/data-center/a45e442e-9989-11e8-b0e4-00163e4bf18a/1f2e9989-9ab3-43d5-971d-568b8feca918/images/d81a6826-dc46-44db-8de7-405d30e44d57/2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',
>> 'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID':
>> '2d6d5f87-ccb0-48ce-b3ac-84495bd12d32', 'diskType': 'file', 'alias':
>> 'ua-d81a6826-dc46-44db-8de7-405d30e44d57', 'discard': False}
>>
>> On Wed, Feb 13, 2019 at 1:19 PM Jayme  wrote:
>>
>>> I may have made matters worse.  So I changed to 4.3 compatible cluster
>>> then 4.3 compatible data center.  All VMs were marked as requiring a
>>> reboot.  I restarted a couple of them and none of them will start up, they
>>> are saying "bad volume specification".  The ones running that I did not yet
>>> restart are 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Ron Jerome
> This may be happening because I changed cluster compatibility to 4.3 then
> immediately after changed data center compatibility to 4.3 (before
> restarting VMs after cluster compatibility change).  If this is the case I
> can't fix by downgrading the data center compatibility to 4.2 as it won't
> allow me to do so.  What can I do to fix this, any VM I restart will break
> (I am leaving the others running for now, but there are some down that I
> can't start).
> 
>

It would seem you ran into the "next" issue that I also ran into.  During the 
upgrade process, the ownership of the disk image of the running VM's gets 
changed from vdsm.kvm to root.root.  (see 
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/6JFTGJ37KDZQ5KMLU32LNB5ZZTFQIRFG/)
 

You need to find those disk images and change the ownership back to vdsm.kvm 
then they will boot again.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LJRGBB2MFYHRB3VCJLSZ36DMEI42IZIM/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
I might be hitting this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1666795

On Wed, Feb 13, 2019 at 1:35 PM Jayme  wrote:

> This may be happening because I changed cluster compatibility to 4.3 then
> immediately after changed data center compatibility to 4.3 (before
> restarting VMs after cluster compatibility change).  If this is the case I
> can't fix by downgrading the data center compatibility to 4.2 as it won't
> allow me to do so.  What can I do to fix this, any VM I restart will break
> (I am leaving the others running for now, but there are some down that I
> can't start).
>
> Full error from VDSM:
>
> 2019-02-13 13:30:55,465-0400 ERROR (vm/d070ce80)
> [storage.TaskManager.Task] (Task='d5c8e50a-0a6f-4fe7-be79-fd322b273a1e')
> Unexpected error (task:875)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
> in _run
> return fn(*args, **kargs)
>   File "", line 2, in prepareImage
>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
> method
> ret = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3198,
> in prepareImage
> legality = dom.produceVolume(imgUUID, volUUID).getLegality()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 818, in
> produceVolume
> volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py",
> line 45, in __init__
> volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
> 800, in __init__
> self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
> 71, in __init__
> volUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
> in __init__
> self.validate()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line
> 112, in validate
> self.validateVolumePath()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
> 131, in validateVolumePath
> raise se.VolumeDoesNotExist(self.volUUID)
> VolumeDoesNotExist: Volume does not exist:
> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',)
> 2019-02-13 13:30:55,468-0400 ERROR (vm/d070ce80) [storage.Dispatcher]
> FINISH prepareImage error=Volume does not exist:
> (u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',) (dispatcher:81)
> 2019-02-13 13:30:55,469-0400 ERROR (vm/d070ce80) [virt.vm]
> (vmId='d070ce80-e0bc-489d-8ee0-47d5926d5ae2') The vm start process failed
> (vm:937)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in
> _startUnderlyingVm
> self._run()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2749, in
> _run
> self._devices = self._make_devices()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2589, in
> _make_devices
> disk_objs = self._perform_host_local_adjustment()
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2662, in
> _perform_host_local_adjustment
> self._preparePathsForDrives(disk_params)
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1011, in
> _preparePathsForDrives
> drive['path'] = self.cif.prepareVolumePath(drive, self.id)
>   File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in
> prepareVolumePath
> raise vm.VolumeError(drive)
> VolumeError: Bad volume specification {'address': {'function': '0x0',
> 'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'},
> 'serial': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface':
> 'virtio', 'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
> 'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
> '64293814272', 'type': 'disk', 'domainID':
> '1f2e9989-9ab3-43d5-971d-568b8feca918', 'reqsize': '0', 'format': 'cow',
> 'poolID': 'a45e442e-9989-11e8-b0e4-00163e4bf18a', 'device': 'disk', 'path':
> '/rhev/data-center/a45e442e-9989-11e8-b0e4-00163e4bf18a/1f2e9989-9ab3-43d5-971d-568b8feca918/images/d81a6826-dc46-44db-8de7-405d30e44d57/2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',
> 'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID':
> '2d6d5f87-ccb0-48ce-b3ac-84495bd12d32', 'diskType': 'file', 'alias':
> 'ua-d81a6826-dc46-44db-8de7-405d30e44d57', 'discard': False}
>
> On Wed, Feb 13, 2019 at 1:19 PM Jayme  wrote:
>
>> I may have made matters worse.  So I changed to 4.3 compatible cluster
>> then 4.3 compatible data center.  All VMs were marked as requiring a
>> reboot.  I restarted a couple of them and none of them will start up, they
>> are saying "bad volume specification".  The ones running that I did not yet
>> restart are still running ok.  I need to figure out why the VMs aren't
>> restarting.
>>
>> Here is an example from vdsm.log
>>
>> olumeError: Bad volume specification {'address': {'function': '0x0',
>> 'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
This may be happening because I changed cluster compatibility to 4.3 then
immediately after changed data center compatibility to 4.3 (before
restarting VMs after cluster compatibility change).  If this is the case I
can't fix by downgrading the data center compatibility to 4.2 as it won't
allow me to do so.  What can I do to fix this, any VM I restart will break
(I am leaving the others running for now, but there are some down that I
can't start).

Full error from VDSM:

2019-02-13 13:30:55,465-0400 ERROR (vm/d070ce80) [storage.TaskManager.Task]
(Task='d5c8e50a-0a6f-4fe7-be79-fd322b273a1e') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
in _run
return fn(*args, **kargs)
  File "", line 2, in prepareImage
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
method
ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3198,
in prepareImage
legality = dom.produceVolume(imgUUID, volUUID).getLegality()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 818, in
produceVolume
volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterVolume.py",
line 45, in __init__
volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 800,
in __init__
self._manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
71, in __init__
volUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 86,
in __init__
self.validate()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 112,
in validate
self.validateVolumePath()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line
131, in validateVolumePath
raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist:
(u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',)
2019-02-13 13:30:55,468-0400 ERROR (vm/d070ce80) [storage.Dispatcher]
FINISH prepareImage error=Volume does not exist:
(u'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',) (dispatcher:81)
2019-02-13 13:30:55,469-0400 ERROR (vm/d070ce80) [virt.vm]
(vmId='d070ce80-e0bc-489d-8ee0-47d5926d5ae2') The vm start process failed
(vm:937)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in
_startUnderlyingVm
self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2749, in
_run
self._devices = self._make_devices()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2589, in
_make_devices
disk_objs = self._perform_host_local_adjustment()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2662, in
_perform_host_local_adjustment
self._preparePathsForDrives(disk_params)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1011, in
_preparePathsForDrives
drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in
prepareVolumePath
raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'address': {'function': '0x0',
'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'},
'serial': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface':
'virtio', 'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
'64293814272', 'type': 'disk', 'domainID':
'1f2e9989-9ab3-43d5-971d-568b8feca918', 'reqsize': '0', 'format': 'cow',
'poolID': 'a45e442e-9989-11e8-b0e4-00163e4bf18a', 'device': 'disk', 'path':
'/rhev/data-center/a45e442e-9989-11e8-b0e4-00163e4bf18a/1f2e9989-9ab3-43d5-971d-568b8feca918/images/d81a6826-dc46-44db-8de7-405d30e44d57/2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',
'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID':
'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32', 'diskType': 'file', 'alias':
'ua-d81a6826-dc46-44db-8de7-405d30e44d57', 'discard': False}

On Wed, Feb 13, 2019 at 1:19 PM Jayme  wrote:

> I may have made matters worse.  So I changed to 4.3 compatible cluster
> then 4.3 compatible data center.  All VMs were marked as requiring a
> reboot.  I restarted a couple of them and none of them will start up, they
> are saying "bad volume specification".  The ones running that I did not yet
> restart are still running ok.  I need to figure out why the VMs aren't
> restarting.
>
> Here is an example from vdsm.log
>
> olumeError: Bad volume specification {'address': {'function': '0x0',
> 'bus': '0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'},
> 'serial': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface':
> 'virtio', 'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
> 'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
> '64293814272', 'type': 'disk', 'domainID':
> '1f2e9989-9ab3-43d5-971d-568b8feca918', 

[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
I may have made matters worse.  So I changed to 4.3 compatible cluster then
4.3 compatible data center.  All VMs were marked as requiring a reboot.  I
restarted a couple of them and none of them will start up, they are saying
"bad volume specification".  The ones running that I did not yet restart
are still running ok.  I need to figure out why the VMs aren't restarting.

Here is an example from vdsm.log

olumeError: Bad volume specification {'address': {'function': '0x0', 'bus':
'0x00', 'domain': '0x', 'type': 'pci', 'slot': '0x06'}, 'serial':
'd81a6826-dc46-44db-8de7-405d30e44d57', 'index': 0, 'iface': 'virtio',
'apparentsize': '64293699584', 'specParams': {}, 'cache': 'none',
'imageID': 'd81a6826-dc46-44db-8de7-405d30e44d57', 'truesize':
'64293814272', 'type': 'disk', 'domainID':
'1f2e9989-9ab3-43d5-971d-568b8feca918', 'reqsize': '0', 'format': 'cow',
'poolID': 'a45e442e-9989-11e8-b0e4-00163e4bf18a', 'device': 'disk', 'path':
'/rhev/data-center/a45e442e-9989-11e8-b0e4-00163e4bf18a/1f2e9989-9ab3-43d5-971d-568b8feca918/images/d81a6826-dc46-44db-8de7-405d30e44d57/2d6d5f87-ccb0-48ce-b3ac-84495bd12d32',
'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID':
'2d6d5f87-ccb0-48ce-b3ac-84495bd12d32', 'diskType': 'file', 'alias':
'ua-d81a6826-dc46-44db-8de7-405d30e44d57', 'discard': False}

On Wed, Feb 13, 2019 at 1:01 PM Jayme  wrote:

> I think I just figured out what I was doing wrong.  On edit cluster screen
> I was changing both the CPU type and cluster level 4.3.  I tried it again
> by switching to the new CPU type first (leaving cluster on 4.2) then
> saving, then going back in and switching compat level to 4.3.  It appears
> that you need to do this in two steps for it to work.
>
>
>
> On Wed, Feb 13, 2019 at 12:57 PM Jayme  wrote:
>
>> Hmm interesting, I wonder how you were able to switch from SandyBridge
>> IBRS to SandyBridge IBRS SSBD.  I just attempted the same in both regular
>> mode and in global maintenance mode and it won't allow me to, it says that
>> all hosts have to be in maintenance mode (screenshots attached).   Are you
>> also running HCI/Gluster setup?
>>
>>
>>
>> On Wed, Feb 13, 2019 at 12:44 PM Ron Jerome  wrote:
>>
>>> > Environment setup:
>>> >
>>> > 3 Host HCI GlusterFS setup.  Identical hosts, Dell R720s w/ Intel
>>> E5-2690
>>> > CPUs
>>> >
>>> > 1 default data center (4.2 compat)
>>> > 1 default cluster (4.2 compat)
>>> >
>>> > Situation: I recently upgraded my three node HCI cluster from Ovirt
>>> 4.2 to
>>> > 4.3.  I did so by first updating the engine to 4.3 then upgrading each
>>> > ovirt-node host to 4.3 and rebooting.
>>> >
>>> > Currently engine and all hosts are running 4.3 and all is working fine.
>>> >
>>> > To complete the upgrade I need to update cluster compatibility to 4.3
>>> and
>>> > then data centre to 4.3.  This is where I am stuck.
>>> >
>>> > The CPU type on cluster is "Intel SandyBridge IBRS Family".  This
>>> option is
>>> > no longer available if I select 4.3 compatibility.  Any other option
>>> chosen
>>> > such as SandyBridge IBRS SSBD will not allow me to switch to 4.3 as all
>>> > hosts must be in maintenance mode (which is not possible w/ self hosted
>>> > engine).
>>> >
>>> > I saw another post about this where someone else followed steps to
>>> create a
>>> > second cluster on 4.3 with new CPU type then move one host to it, start
>>> > engine on it then perform other steps to eventually get to 4.3
>>> > compatibility.
>>> >
>>>
>>> I have the exact same hardware configuration and was able to change to
>>> "SandyBridge IBRS SSBD" without creating a new cluster.  How I made that
>>> happen, I'm not so sure, but the cluster may have been in "Global
>>> Maintenance" mode when I changed it.
>>>
>>>
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5B3TAXKO7IBTWRVNF2K4II472TDISO6P/
>>>
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CPFLAV4W4OQGDV7SUUTSHFFD6KSTHOAB/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-13 Thread Jayme
I think I just figured out what I was doing wrong.  On edit cluster screen
I was changing both the CPU type and cluster level 4.3.  I tried it again
by switching to the new CPU type first (leaving cluster on 4.2) then
saving, then going back in and switching compat level to 4.3.  It appears
that you need to do this in two steps for it to work.



On Wed, Feb 13, 2019 at 12:57 PM Jayme  wrote:

> Hmm interesting, I wonder how you were able to switch from SandyBridge
> IBRS to SandyBridge IBRS SSBD.  I just attempted the same in both regular
> mode and in global maintenance mode and it won't allow me to, it says that
> all hosts have to be in maintenance mode (screenshots attached).   Are you
> also running HCI/Gluster setup?
>
>
>
> On Wed, Feb 13, 2019 at 12:44 PM Ron Jerome  wrote:
>
>> > Environment setup:
>> >
>> > 3 Host HCI GlusterFS setup.  Identical hosts, Dell R720s w/ Intel
>> E5-2690
>> > CPUs
>> >
>> > 1 default data center (4.2 compat)
>> > 1 default cluster (4.2 compat)
>> >
>> > Situation: I recently upgraded my three node HCI cluster from Ovirt 4.2
>> to
>> > 4.3.  I did so by first updating the engine to 4.3 then upgrading each
>> > ovirt-node host to 4.3 and rebooting.
>> >
>> > Currently engine and all hosts are running 4.3 and all is working fine.
>> >
>> > To complete the upgrade I need to update cluster compatibility to 4.3
>> and
>> > then data centre to 4.3.  This is where I am stuck.
>> >
>> > The CPU type on cluster is "Intel SandyBridge IBRS Family".  This
>> option is
>> > no longer available if I select 4.3 compatibility.  Any other option
>> chosen
>> > such as SandyBridge IBRS SSBD will not allow me to switch to 4.3 as all
>> > hosts must be in maintenance mode (which is not possible w/ self hosted
>> > engine).
>> >
>> > I saw another post about this where someone else followed steps to
>> create a
>> > second cluster on 4.3 with new CPU type then move one host to it, start
>> > engine on it then perform other steps to eventually get to 4.3
>> > compatibility.
>> >
>>
>> I have the exact same hardware configuration and was able to change to
>> "SandyBridge IBRS SSBD" without creating a new cluster.  How I made that
>> happen, I'm not so sure, but the cluster may have been in "Global
>> Maintenance" mode when I changed it.
>>
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5B3TAXKO7IBTWRVNF2K4II472TDISO6P/
>>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V6IAVNC3Z5PMTTA263QOAJZ5P2HILPUP/