On Thu, Feb 21, 2019 at 11:11 PM Jason P. Thomas <jthom...@gmualumni.org> wrote: > > On 2/20/19 5:33 PM, Darrell Budic wrote: > > I was just helping Tristam on #ovirt with a similar problem, we found that > his two upgraded nodes were running multiple glusterfsd processes per brick > (but not all bricks). His volume & brick files in /var/lib/gluster looked > normal, but starting glusterd would often spawn extra fsd processes per > brick, seemed random. Gluster bug? Maybe related to > https://bugzilla.redhat.com/show_bug.cgi?id=1651246, but I’m helping debug > this one second hand… Possibly related to the brick crashes? We wound up > stopping glusterd, killing off all the fsds, restarting glusterd, and > repeating until it only spawned one fsd per brick. Did that to each updated > server, then restarted glusterd on the not-yet-updated server to get it > talking to the right bricks. That seemed to get to a mostly stable gluster > environment, but he’s still seeing 1-2 files listed as needing healing on the > upgraded bricks (but not the 3.12 brick). Mainly the DIRECT_IO_TEST and one > of the dom/ids files, but he can probably update that. Did manage to get his > engine going again, waiting to see if he’s stable now. > > Anyway, figured it was worth posting about so people could check for multiple > brick processes (glusterfsd) if they hit this stability issue as well, maybe > find common ground. > > Note: also encountered https://bugzilla.redhat.com/show_bug.cgi?id=1348434 > trying to get his engine back up, restarting libvirtd let us get it going > again. Maybe un-needed if he’d been able to complete his third node upgrades, > but he got stuck before then, so... > > -Darrell > > Stable is a relative term. My unsynced entries total for each of my 4 > volumes changes drastically (with the exception of the engine volume, it > pretty much bounces between 1 and 4). The cluster has been "healing" for 18 > hours or so and only the unupgraded HC node has healed bricks. I did have > the problem that some files/directories were owned by root:root. These VMs > did not boot until I changed ownership to 36:36. Even after 18 hours, > there's anywhere from 20-386 entries in vol heal info for my 3 non engine > bricks. Overnight I had one brick on one volume go down on one HC node. > When I bounced glusterd, it brought up a new fsd process for that brick. I > killed the old one and now vol status reports the right pid on each of the > nodes. This is quite the debacle. If I can provide any info that might help > get this debacle moving in the right direction, let me know.
Can you provide the gluster brick logs and glusterd logs from the servers (from /var/log/glusterfs/). Since you mention that heal seems to be stuck, could you also provide the heal logs from /var/log/glusterfs/glustershd.log If you can log a bug with these logs, that would be great - please use https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS to log the bug. > > Jason aka Tristam > > > On Feb 14, 2019, at 1:12 AM, Sahina Bose <sab...@redhat.com> wrote: > > On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome <ronj...@gmail.com> wrote: > > > > > Can you be more specific? What things did you see, and did you report bugs? > > > I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054 > and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246 > and I've got bricks randomly going offline and getting out of sync with the > others at which point I've had to manually stop and start the volume to get > things back in sync. > > > Thanks for reporting these. Will follow up on the bugs to ensure > they're addressed. > Regarding brciks going offline - are the brick processes crashing? Can > you provide logs of glusterd and bricks. Or is this to do with > ovirt-engine and brick status not being in sync? > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/ > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V4H7Q2RKS2C4ZUMWDODQY6/ > > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: > https://secure-web.cisco.com/1ubMaXUij250PN8zKVQvmo6NUYWPOdVDirkU4lwkRkpCkQix6ZJlGJiEF1lWy8_04u2Ems0FwTKbgPFhm06jfILR59nJDNUIiCeN5YkYj0RU-r9UbaWrCmz_uLZJISuevoC0ELHCC121je2k5qatuVVcZL3XrG4eOeOFlhAd7riOB_HVcTdkWXGXF5hw6IiQj4E33rY5vEP9waE6nkhZO6bk08CLKlYrPyVF0o8d1-X8ntzhjWIE311h2ZNlu9KFarFqe5cckSGvVh1UiHQ3AKuBPZAvPKIH7KXsL6iFBNG-pJm-uVP27ZUnoeEQaG8kAVc6jW43e7fxfBUvrzmiFlQyD2o3HBrNNlbtHGjYU5Wy3Ao2H09QtCReoIaypCYbwS6Di3wqgY0lNuxB7swSo1vziW4Uez_j5sRmSl43UgIXzzjoeu4gWwRyfeteXo88x/https%3A%2F%2Fwww.ovirt.org%2Fsite%2Fprivacy-policy%2F > oVirt Code of Conduct: > https://secure-web.cisco.com/1HjeIIkwx_NRkoCsnonfHu87z-MFaPfE3HOMBJ02Mzwyj-9AxzEENIuSMb_cTt98gAuZrWnSWq26-hUbz4lqcziFPWDUWOpWeYyBfQFYYld79cH960SfEhrOi44Gl9GDPCs27iXPJ1Kpxbp0t3iyi0HmC9QqLoXswWm8sIRPgvg4g1q9sSRKmrTyqylP8-MEETXdMXW-SwYeQT0I-_w1GH9VHOuy6cYf8bqaAwYFtAQ_TDrJX0atMmNh1bqDF3BLKxeXePEZCwqondC9a5ovB9-FzZcpUHrT4YK6gOIng55mdlAj6j-6lyw9N5gNXtfz9oq5DH78nE15q6iFyyEVG58pbrUje45FJdy9WsRRvNttcFbzgtb5E5-RtoFgdIYf5fJfchr0o1NVNHWpb1beyhLeM8_fS1Pzy-Fo8m0r_ZcYtOQ1WOdfE5fs5QRz2UVVZ/https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F > List Archives: > https://secure-web.cisco.com/1XcKrt1wH3y9o2mcDXqQa9v-MXc1VugRHkrHz1HJwNk-1Mv89pcENMjNLP_TEZ99urLjX0r-Njjx1RP-mFIsJ_OOvLqjsx1fHATqYdaQf4kPCSF2q9mQeT69waxxb6pMsr12XPMv8rLYkx4aW2OstK2D-qPwT8zq5VxhKu-BnlokI1iS8eE08BgzugQl5Z471i_6Huk6h9jYCYxWW7lPu3OMBmRtlsV_wIshfnu1Cslu_sAOh_44DsDsfPswlNOHzVWX7bS67AKwhr7Ic-QUeew3FJXQL_JnMXZstYWxXgZhK48wgp1CNMhjhva4OiBwm6eKnvMjB6_IYQQSbjO0qg9MHHQQ01BfgJmM2uWLdzAeK7e7S_JKPndAQVyg2LOhECm4JX8GEUEC6a4zM1WbXoA-Zp-vUvOMfzeM-DWmIvqfyojx-yRwDcI8r7HtcJvhE/https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FDYFZAC4BPJNGZP3PEZ6ZP2AB3C3JVAFM%2F > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/O2WI5F77DVZ3HZORNNXOIWFLMP4GYERV/ _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UG7XXWITXUUJYMAXWYZUXJFQGSYWRND6/