Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-22 Thread Vince Loschiavo
Thank you for that information.

Are there plans to restore the previous functionality in a later release of
3.6.x? Or is this what we should expect going forward?



On Thu, Nov 20, 2014 at 11:24 PM, Anuradha Talur ata...@redhat.com wrote:



 - Original Message -
  From: Joe Julian j...@julianfamily.org
  To: Anuradha Talur ata...@redhat.com, Vince Loschiavo 
 vloschi...@gmail.com
  Cc: gluster-users@gluster.org Gluster-users@gluster.org
  Sent: Friday, November 21, 2014 12:06:27 PM
  Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
 related)
 
 
 
  On November 20, 2014 10:01:45 PM PST, Anuradha Talur ata...@redhat.com
  wrote:
  
  
  - Original Message -
   From: Vince Loschiavo vloschi...@gmail.com
   To: gluster-users@gluster.org Gluster-users@gluster.org
   Sent: Wednesday, November 19, 2014 9:50:50 PM
   Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
  related)
  
  
   Hello Gluster Community,
  
   I have been using the Nagios monitoring scripts, mentioned in the
  below
   thread, on 3.5.2 with great success. The most useful of these is the
  self
   heal.
  
   However, I've just upgraded to 3.6.1 on the lab and the self heal
  daemon has
   become quite aggressive. I continually get alerts/warnings on 3.6.1
  that
   virt disk images need self heal, then they clear. This is not the
  case on
   3.5.2. This
  
   Configuration:
   2 node, 2 brick replicated volume with 2x1GB LAG network between the
  peers
   using this volume as a QEMU/KVM virt image store through the fuse
  mount on
   Centos 6.5.
  
   Example:
   on 3.5.2:
   gluster volume heal volumename info: shows the bricks and number of
  entries
   to be healed: 0
  
   On v3.5.2 - During normal gluster operations, I can run this command
  over and
   over again, 2-4 times per second, and it will always show 0 entries
  to be
   healed. I've used this as an indicator that the bricks are
  synchronized.
  
   Last night, I upgraded to 3.6.1 in lab and I'm seeing different
  behavior.
   Running gluster volume heal volumename info , during normal
  operations, will
   show a file out-of-sync, seemingly between every block written to
  disk then
   synced to the peer. I can run the command over and over again, 2-4
  times per
   second, and it will almost always show something out of sync. The
  individual
   files change, meaning:
  
   Example:
   1st Run: shows file1 out of sync
   2nd run: shows file 2 and file 3 out of sync but file 1 is now in
  sync (not
   in the list)
   3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in
  sync
   (not in the list).
   ...
   nth run: shows 0 files out of sync
   nth+1 run: shows file 3 and 12 out of sync.
  
   From looking at the virtual machines running off this gluster volume,
  it's
   obvious that gluster is working well. However, this obviously plays
  havoc
   with Nagios and alerts. Nagios will run the heal info and get
  different and
   non-useful results each time, and will send alerts.
  
   Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to
  tune the
   settings or change the monitoring method to get better results into
  Nagios.
  
  In 3.6.1 the way heal info command works is different from that in
  3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that
  might need healing. Currently, in 3.6.1, there isn't a method to
  distinguish between a file that is being healed and a file with
  on-going I/O while listing. Hence you see files with normal operation
  too listed in the output of heal info command.
 
  How did that regression pass?!
 Test cases to check this condition was not written in regression tests.
 

 --
 Thanks,
 Anuradha.




-- 
-Vince Loschiavo
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-22 Thread Pranith Kumar Karampuri


On 11/22/2014 10:12 PM, Vince Loschiavo wrote:

Thank you for that information.

Are there plans to restore the previous functionality in a later 
release of 3.6.x? Or is this what we should expect going forward?
Yes it will definitely be fixed. Wait for the next release. Things 
should be fine.


Pranith




On Thu, Nov 20, 2014 at 11:24 PM, Anuradha Talur ata...@redhat.com 
mailto:ata...@redhat.com wrote:




- Original Message -
 From: Joe Julian j...@julianfamily.org
mailto:j...@julianfamily.org
 To: Anuradha Talur ata...@redhat.com
mailto:ata...@redhat.com, Vince Loschiavo
vloschi...@gmail.com mailto:vloschi...@gmail.com
 Cc: gluster-users@gluster.org
mailto:gluster-users@gluster.org Gluster-users@gluster.org
mailto:Gluster-users@gluster.org
 Sent: Friday, November 21, 2014 12:06:27 PM
 Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help
(Nagios related)



 On November 20, 2014 10:01:45 PM PST, Anuradha Talur
ata...@redhat.com mailto:ata...@redhat.com
 wrote:
 
 
 - Original Message -
  From: Vince Loschiavo vloschi...@gmail.com
mailto:vloschi...@gmail.com
  To: gluster-users@gluster.org
mailto:gluster-users@gluster.org Gluster-users@gluster.org
mailto:Gluster-users@gluster.org
  Sent: Wednesday, November 19, 2014 9:50:50 PM
  Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help
(Nagios
 related)
 
 
  Hello Gluster Community,
 
  I have been using the Nagios monitoring scripts, mentioned in the
 below
  thread, on 3.5.2 with great success. The most useful of these
is the
 self
  heal.
 
  However, I've just upgraded to 3.6.1 on the lab and the self heal
 daemon has
  become quite aggressive. I continually get alerts/warnings on
3.6.1
 that
  virt disk images need self heal, then they clear. This is not the
 case on
  3.5.2. This
 
  Configuration:
  2 node, 2 brick replicated volume with 2x1GB LAG network
between the
 peers
  using this volume as a QEMU/KVM virt image store through the fuse
 mount on
  Centos 6.5.
 
  Example:
  on 3.5.2:
  gluster volume heal volumename info: shows the bricks and
number of
 entries
  to be healed: 0
 
  On v3.5.2 - During normal gluster operations, I can run this
command
 over and
  over again, 2-4 times per second, and it will always show 0
entries
 to be
  healed. I've used this as an indicator that the bricks are
 synchronized.
 
  Last night, I upgraded to 3.6.1 in lab and I'm seeing different
 behavior.
  Running gluster volume heal volumename info , during normal
 operations, will
  show a file out-of-sync, seemingly between every block written to
 disk then
  synced to the peer. I can run the command over and over
again, 2-4
 times per
  second, and it will almost always show something out of sync. The
 individual
  files change, meaning:
 
  Example:
  1st Run: shows file1 out of sync
  2nd run: shows file 2 and file 3 out of sync but file 1 is now in
 sync (not
  in the list)
  3rd run: shows file 3 and file 4 out of sync but file 1 and 2
are in
 sync
  (not in the list).
  ...
  nth run: shows 0 files out of sync
  nth+1 run: shows file 3 and 12 out of sync.
 
  From looking at the virtual machines running off this gluster
volume,
 it's
  obvious that gluster is working well. However, this obviously
plays
 havoc
  with Nagios and alerts. Nagios will run the heal info and get
 different and
  non-useful results each time, and will send alerts.
 
  Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a
way to
 tune the
  settings or change the monitoring method to get better
results into
 Nagios.
 
 In 3.6.1 the way heal info command works is different from that in
 3.5.2. In 3.6.1, it is self-heal daemon that gathers the
entries that
 might need healing. Currently, in 3.6.1, there isn't a method to
 distinguish between a file that is being healed and a file with
 on-going I/O while listing. Hence you see files with normal
operation
 too listed in the output of heal info command.

 How did that regression pass?!
Test cases to check this condition was not written in regression
tests.


--
Thanks,
Anuradha.




--
-Vince Loschiavo


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-20 Thread Anuradha Talur


- Original Message -
 From: Vince Loschiavo vloschi...@gmail.com
 To: gluster-users@gluster.org Gluster-users@gluster.org
 Sent: Wednesday, November 19, 2014 9:50:50 PM
 Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)
 
 
 Hello Gluster Community,
 
 I have been using the Nagios monitoring scripts, mentioned in the below
 thread, on 3.5.2 with great success. The most useful of these is the self
 heal.
 
 However, I've just upgraded to 3.6.1 on the lab and the self heal daemon has
 become quite aggressive. I continually get alerts/warnings on 3.6.1 that
 virt disk images need self heal, then they clear. This is not the case on
 3.5.2. This
 
 Configuration:
 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
 using this volume as a QEMU/KVM virt image store through the fuse mount on
 Centos 6.5.
 
 Example:
 on 3.5.2:
 gluster volume heal volumename info: shows the bricks and number of entries
 to be healed: 0
 
 On v3.5.2 - During normal gluster operations, I can run this command over and
 over again, 2-4 times per second, and it will always show 0 entries to be
 healed. I've used this as an indicator that the bricks are synchronized.
 
 Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
 Running gluster volume heal volumename info , during normal operations, will
 show a file out-of-sync, seemingly between every block written to disk then
 synced to the peer. I can run the command over and over again, 2-4 times per
 second, and it will almost always show something out of sync. The individual
 files change, meaning:
 
 Example:
 1st Run: shows file1 out of sync
 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync (not
 in the list)
 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
 (not in the list).
 ...
 nth run: shows 0 files out of sync
 nth+1 run: shows file 3 and 12 out of sync.
 
 From looking at the virtual machines running off this gluster volume, it's
 obvious that gluster is working well. However, this obviously plays havoc
 with Nagios and alerts. Nagios will run the heal info and get different and
 non-useful results each time, and will send alerts.
 
 Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to tune the
 settings or change the monitoring method to get better results into Nagios.
 
In 3.6.1 the way heal info command works is different from that in 3.5.2. In 
3.6.1, it is self-heal daemon that gathers the entries that might need healing. 
Currently, in 3.6.1, there isn't a method to distinguish between a file that is 
being healed and a file with on-going I/O while listing. Hence you see files 
with normal operation too listed in the output of heal info command.
 Thank you,
 
 --
 -Vince Loschiavo
 
 
 On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com  wrote:
 
 
 
 Hi Gopu,
 
 Awesome !!
 
 We can have a Gluster blog about this implementation.
 
 --Humble
 
 
 
 --Humble
 
 
 On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan  gopukrishnan...@gmail.com 
 wrote:
 
 
 
 Thanks for all your help... I was able to configure nagios using the
 glusterfs plugin. Following link shows how I configured it. Hope it helps
 someone else.:
 
 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
 
 On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com  wrote:
 
 
 
 Hi,
 
 Please look at this thread
 http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
 
 Btw, if you are around, we have a talk on same topic in upcoming GlusterFS
 India meetup.
 
 Details can be fetched from:
 http://www.meetup.com/glusterfs-India/
 
 --Humble
 
 --Humble
 
 
 On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan  gopukrishnan...@gmail.com 
 wrote:
 
 
 
 How can we monitor the glusters and alert us if something happened wrong. I
 found some nagios plugins and didn't work until this time. I am still
 experimenting with those. Any suggestions would be much helpful
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
 
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks,
Anuradha.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-20 Thread Joe Julian


On November 20, 2014 10:01:45 PM PST, Anuradha Talur ata...@redhat.com wrote:


- Original Message -
 From: Vince Loschiavo vloschi...@gmail.com
 To: gluster-users@gluster.org Gluster-users@gluster.org
 Sent: Wednesday, November 19, 2014 9:50:50 PM
 Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
related)
 
 
 Hello Gluster Community,
 
 I have been using the Nagios monitoring scripts, mentioned in the
below
 thread, on 3.5.2 with great success. The most useful of these is the
self
 heal.
 
 However, I've just upgraded to 3.6.1 on the lab and the self heal
daemon has
 become quite aggressive. I continually get alerts/warnings on 3.6.1
that
 virt disk images need self heal, then they clear. This is not the
case on
 3.5.2. This
 
 Configuration:
 2 node, 2 brick replicated volume with 2x1GB LAG network between the
peers
 using this volume as a QEMU/KVM virt image store through the fuse
mount on
 Centos 6.5.
 
 Example:
 on 3.5.2:
 gluster volume heal volumename info: shows the bricks and number of
entries
 to be healed: 0
 
 On v3.5.2 - During normal gluster operations, I can run this command
over and
 over again, 2-4 times per second, and it will always show 0 entries
to be
 healed. I've used this as an indicator that the bricks are
synchronized.
 
 Last night, I upgraded to 3.6.1 in lab and I'm seeing different
behavior.
 Running gluster volume heal volumename info , during normal
operations, will
 show a file out-of-sync, seemingly between every block written to
disk then
 synced to the peer. I can run the command over and over again, 2-4
times per
 second, and it will almost always show something out of sync. The
individual
 files change, meaning:
 
 Example:
 1st Run: shows file1 out of sync
 2nd run: shows file 2 and file 3 out of sync but file 1 is now in
sync (not
 in the list)
 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in
sync
 (not in the list).
 ...
 nth run: shows 0 files out of sync
 nth+1 run: shows file 3 and 12 out of sync.
 
 From looking at the virtual machines running off this gluster volume,
it's
 obvious that gluster is working well. However, this obviously plays
havoc
 with Nagios and alerts. Nagios will run the heal info and get
different and
 non-useful results each time, and will send alerts.
 
 Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to
tune the
 settings or change the monitoring method to get better results into
Nagios.
 
In 3.6.1 the way heal info command works is different from that in
3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that
might need healing. Currently, in 3.6.1, there isn't a method to
distinguish between a file that is being healed and a file with
on-going I/O while listing. Hence you see files with normal operation
too listed in the output of heal info command.

How did that regression pass?!
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-20 Thread Anuradha Talur


- Original Message -
 From: Joe Julian j...@julianfamily.org
 To: Anuradha Talur ata...@redhat.com, Vince Loschiavo 
 vloschi...@gmail.com
 Cc: gluster-users@gluster.org Gluster-users@gluster.org
 Sent: Friday, November 21, 2014 12:06:27 PM
 Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios 
 related)
 
 
 
 On November 20, 2014 10:01:45 PM PST, Anuradha Talur ata...@redhat.com
 wrote:
 
 
 - Original Message -
  From: Vince Loschiavo vloschi...@gmail.com
  To: gluster-users@gluster.org Gluster-users@gluster.org
  Sent: Wednesday, November 19, 2014 9:50:50 PM
  Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
 related)
  
  
  Hello Gluster Community,
  
  I have been using the Nagios monitoring scripts, mentioned in the
 below
  thread, on 3.5.2 with great success. The most useful of these is the
 self
  heal.
  
  However, I've just upgraded to 3.6.1 on the lab and the self heal
 daemon has
  become quite aggressive. I continually get alerts/warnings on 3.6.1
 that
  virt disk images need self heal, then they clear. This is not the
 case on
  3.5.2. This
  
  Configuration:
  2 node, 2 brick replicated volume with 2x1GB LAG network between the
 peers
  using this volume as a QEMU/KVM virt image store through the fuse
 mount on
  Centos 6.5.
  
  Example:
  on 3.5.2:
  gluster volume heal volumename info: shows the bricks and number of
 entries
  to be healed: 0
  
  On v3.5.2 - During normal gluster operations, I can run this command
 over and
  over again, 2-4 times per second, and it will always show 0 entries
 to be
  healed. I've used this as an indicator that the bricks are
 synchronized.
  
  Last night, I upgraded to 3.6.1 in lab and I'm seeing different
 behavior.
  Running gluster volume heal volumename info , during normal
 operations, will
  show a file out-of-sync, seemingly between every block written to
 disk then
  synced to the peer. I can run the command over and over again, 2-4
 times per
  second, and it will almost always show something out of sync. The
 individual
  files change, meaning:
  
  Example:
  1st Run: shows file1 out of sync
  2nd run: shows file 2 and file 3 out of sync but file 1 is now in
 sync (not
  in the list)
  3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in
 sync
  (not in the list).
  ...
  nth run: shows 0 files out of sync
  nth+1 run: shows file 3 and 12 out of sync.
  
  From looking at the virtual machines running off this gluster volume,
 it's
  obvious that gluster is working well. However, this obviously plays
 havoc
  with Nagios and alerts. Nagios will run the heal info and get
 different and
  non-useful results each time, and will send alerts.
  
  Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to
 tune the
  settings or change the monitoring method to get better results into
 Nagios.
  
 In 3.6.1 the way heal info command works is different from that in
 3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that
 might need healing. Currently, in 3.6.1, there isn't a method to
 distinguish between a file that is being healed and a file with
 on-going I/O while listing. Hence you see files with normal operation
 too listed in the output of heal info command.
 
 How did that regression pass?!
Test cases to check this condition was not written in regression tests.
 

-- 
Thanks,
Anuradha.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-19 Thread Humble Devassy Chirammal
Hi Vince,
It could be a behavioural change in heal process output capture with latest
GlusterFS. If that is the case, we may tune the interval which  nagios
collect heal info output  or some other settings to avoid continuous
alerts. I am Ccing  gluster nagios devs.

--Humble

--Humble


On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo vloschi...@gmail.com
wrote:


 Hello Gluster Community,

 I have been using the Nagios monitoring scripts, mentioned in the below
 thread, on 3.5.2 with great success. The most useful of these is the self
 heal.

 However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
 has become quite aggressive.  I continually get alerts/warnings on 3.6.1
 that virt disk images need self heal, then they clear.  This is not the
 case on 3.5.2.  This

 Configuration:
 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
 using this volume as a QEMU/KVM virt image store through the fuse mount on
 Centos 6.5.

 Example:
 on 3.5.2:
 *gluster volume heal volumename info:  *shows the bricks and number of
 entries to be healed: 0

 On v3.5.2 - During normal gluster operations, I can run this command over
 and over again, 2-4 times per second, and it will always show 0 entries to
 be healed.  I've used this as an indicator that the bricks are
 synchronized.

 Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
 Running *gluster volume heal volumename info*, during normal operations,
 will show a file out-of-sync, seemingly between every block written to disk
 then synced to the peer.  I can run the command over and over again, 2-4
 times per second, and it will almost always show something out of sync.
 The individual files change, meaning:

 Example:
 1st Run: shows file1 out of sync
 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
 (not in the list)
 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
 (not in the list).
 ...
 nth run: shows 0 files out of sync
 nth+1 run: shows file 3 and 12 out of sync.

 From looking at the virtual machines running off this gluster volume, it's
 obvious that gluster is working well.  However, this obviously plays havoc
 with Nagios and alerts.  Nagios will run the heal info and get different
 and non-useful results each time, and will send alerts.

 Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to tune
 the settings or change the monitoring method to get better results into
 Nagios.

 Thank you,

 --
 -Vince Loschiavo


 On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com wrote:

 Hi Gopu,

 Awesome !!

 We can  have a Gluster blog about this implementation.

 --Humble



 --Humble


 On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan gopukrishnan...@gmail.com
  wrote:

 Thanks for all your help... I was able to configure nagios using the
 glusterfs plugin. Following link shows how I configured it. Hope it helps
 someone else.:


 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/

 On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com wrote:

 Hi,

 Please look at this thread
 http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html

 Btw,  if you are around, we have a talk on same topic in upcoming
 GlusterFS India meetup.

 Details can be fetched from:
  http://www.meetup.com/glusterfs-India/

 --Humble

 --Humble


 On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan 
 gopukrishnan...@gmail.com wrote:

 How can we monitor the glusters and alert us if something happened
 wrong. I found some nagios plugins and didn't work until this time. I am
 still experimenting with those. Any suggestions would be much helpful

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users





 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users






 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-19 Thread Nishanth Thomas
Hi Vince,

Are you referring the monitoring scripts mentioned in the blog( 
http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/)
 or the scripts part of the 
gluster(http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html)?
Please confirm?

Thanks,
Nishanth

- Original Message -
From: Humble Devassy Chirammal humble.deva...@gmail.com
To: Vince Loschiavo vloschi...@gmail.com
Cc: gluster-users@gluster.org Gluster-users@gluster.org, Sahina Bose 
sab...@redhat.com, ntho...@redhat.com
Sent: Wednesday, November 19, 2014 11:22:18 PM
Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Hi Vince,
It could be a behavioural change in heal process output capture with latest
GlusterFS. If that is the case, we may tune the interval which  nagios
collect heal info output  or some other settings to avoid continuous
alerts. I am Ccing  gluster nagios devs.

--Humble

--Humble


On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo vloschi...@gmail.com
wrote:


 Hello Gluster Community,

 I have been using the Nagios monitoring scripts, mentioned in the below
 thread, on 3.5.2 with great success. The most useful of these is the self
 heal.

 However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
 has become quite aggressive.  I continually get alerts/warnings on 3.6.1
 that virt disk images need self heal, then they clear.  This is not the
 case on 3.5.2.  This

 Configuration:
 2 node, 2 brick replicated volume with 2x1GB LAG network between the peers
 using this volume as a QEMU/KVM virt image store through the fuse mount on
 Centos 6.5.

 Example:
 on 3.5.2:
 *gluster volume heal volumename info:  *shows the bricks and number of
 entries to be healed: 0

 On v3.5.2 - During normal gluster operations, I can run this command over
 and over again, 2-4 times per second, and it will always show 0 entries to
 be healed.  I've used this as an indicator that the bricks are
 synchronized.

 Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
 Running *gluster volume heal volumename info*, during normal operations,
 will show a file out-of-sync, seemingly between every block written to disk
 then synced to the peer.  I can run the command over and over again, 2-4
 times per second, and it will almost always show something out of sync.
 The individual files change, meaning:

 Example:
 1st Run: shows file1 out of sync
 2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
 (not in the list)
 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
 (not in the list).
 ...
 nth run: shows 0 files out of sync
 nth+1 run: shows file 3 and 12 out of sync.

 From looking at the virtual machines running off this gluster volume, it's
 obvious that gluster is working well.  However, this obviously plays havoc
 with Nagios and alerts.  Nagios will run the heal info and get different
 and non-useful results each time, and will send alerts.

 Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to tune
 the settings or change the monitoring method to get better results into
 Nagios.

 Thank you,

 --
 -Vince Loschiavo


 On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com wrote:

 Hi Gopu,

 Awesome !!

 We can  have a Gluster blog about this implementation.

 --Humble



 --Humble


 On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan gopukrishnan...@gmail.com
  wrote:

 Thanks for all your help... I was able to configure nagios using the
 glusterfs plugin. Following link shows how I configured it. Hope it helps
 someone else.:


 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/

 On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal 
 humble.deva...@gmail.com wrote:

 Hi,

 Please look at this thread
 http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html

 Btw,  if you are around, we have a talk on same topic in upcoming
 GlusterFS India meetup.

 Details can be fetched from:
  http://www.meetup.com/glusterfs-India/

 --Humble

 --Humble


 On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan 
 gopukrishnan...@gmail.com wrote:

 How can we monitor the glusters and alert us if something happened
 wrong. I found some nagios plugins and didn't work until this time. I am
 still experimenting with those. Any suggestions would be much helpful

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users





 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users






 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users

Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-19 Thread Vince Loschiavo
Thank you!

I think we may need some sort of dampening method and more specific input
into Nagios.  i.e. Details on which files are out-of-sync, versus just the
number of files out-of-sync.

I'm using these:  http://download.gluster.org/pub/gluster/glusterfs-nagios/


On Wed, Nov 19, 2014 at 10:14 AM, Nishanth Thomas ntho...@redhat.com
wrote:

 Hi Vince,

 Are you referring the monitoring scripts mentioned in the blog(
 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/)
 or the scripts part of the gluster(
 http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html)?
 Please confirm?

 Thanks,
 Nishanth

 - Original Message -
 From: Humble Devassy Chirammal humble.deva...@gmail.com
 To: Vince Loschiavo vloschi...@gmail.com
 Cc: gluster-users@gluster.org Gluster-users@gluster.org, Sahina
 Bose sab...@redhat.com, ntho...@redhat.com
 Sent: Wednesday, November 19, 2014 11:22:18 PM
 Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
 related)

 Hi Vince,
 It could be a behavioural change in heal process output capture with latest
 GlusterFS. If that is the case, we may tune the interval which  nagios
 collect heal info output  or some other settings to avoid continuous
 alerts. I am Ccing  gluster nagios devs.

 --Humble

 --Humble


 On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo vloschi...@gmail.com
 wrote:

 
  Hello Gluster Community,
 
  I have been using the Nagios monitoring scripts, mentioned in the below
  thread, on 3.5.2 with great success. The most useful of these is the self
  heal.
 
  However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
  has become quite aggressive.  I continually get alerts/warnings on 3.6.1
  that virt disk images need self heal, then they clear.  This is not the
  case on 3.5.2.  This
 
  Configuration:
  2 node, 2 brick replicated volume with 2x1GB LAG network between the
 peers
  using this volume as a QEMU/KVM virt image store through the fuse mount
 on
  Centos 6.5.
 
  Example:
  on 3.5.2:
  *gluster volume heal volumename info:  *shows the bricks and number of
  entries to be healed: 0
 
  On v3.5.2 - During normal gluster operations, I can run this command over
  and over again, 2-4 times per second, and it will always show 0 entries
 to
  be healed.  I've used this as an indicator that the bricks are
  synchronized.
 
  Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
  Running *gluster volume heal volumename info*, during normal operations,
  will show a file out-of-sync, seemingly between every block written to
 disk
  then synced to the peer.  I can run the command over and over again, 2-4
  times per second, and it will almost always show something out of sync.
  The individual files change, meaning:
 
  Example:
  1st Run: shows file1 out of sync
  2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
  (not in the list)
  3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
  (not in the list).
  ...
  nth run: shows 0 files out of sync
  nth+1 run: shows file 3 and 12 out of sync.
 
  From looking at the virtual machines running off this gluster volume,
 it's
  obvious that gluster is working well.  However, this obviously plays
 havoc
  with Nagios and alerts.  Nagios will run the heal info and get different
  and non-useful results each time, and will send alerts.
 
  Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to
 tune
  the settings or change the monitoring method to get better results into
  Nagios.
 
  Thank you,
 
  --
  -Vince Loschiavo
 
 
  On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal 
  humble.deva...@gmail.com wrote:
 
  Hi Gopu,
 
  Awesome !!
 
  We can  have a Gluster blog about this implementation.
 
  --Humble
 
 
 
  --Humble
 
 
  On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan 
 gopukrishnan...@gmail.com
   wrote:
 
  Thanks for all your help... I was able to configure nagios using the
  glusterfs plugin. Following link shows how I configured it. Hope it
 helps
  someone else.:
 
 
 
 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/
 
  On Sun, Nov 16, 2014 at 11:44 AM, Humble Devassy Chirammal 
  humble.deva...@gmail.com wrote:
 
  Hi,
 
  Please look at this thread
  http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html
 
  Btw,  if you are around, we have a talk on same topic in upcoming
  GlusterFS India meetup.
 
  Details can be fetched from:
   http://www.meetup.com/glusterfs-India/
 
  --Humble
 
  --Humble
 
 
  On Sun, Nov 16, 2014 at 11:23 AM, Gopu Krishnan 
  gopukrishnan...@gmail.com wrote:
 
  How can we monitor the glusters and alert us if something happened
  wrong. I found some nagios plugins and didn't work until this time.
 I am
  still experimenting with those. Any suggestions would be much helpful
 
  ___
  Gluster-users mailing list
  Gluster

Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

2014-11-19 Thread Nishanth Thomas
Hi Vince,

Thank you for the quick response.

For the time being to reduce the frequency the frequency of the alerts, please 
check whether flapping enabled. If not, please go ahead enable it. It will 
suppress the alerts if there is a frequent change in the service status.

Currently the plugin checks for the number of un-synced entries and if it is 
grater than 0, change the state of the service which send the alert. Probably 
this part requires a change and we may have to introduce some thresholds using 
which decision can be taken whether to change the state of the service or not.

Regarding why the upgrade causing the files to go out of sync, someone else 
needs to answer.

Thanks,
Nishanth  

- Original Message -
From: Vince Loschiavo vloschi...@gmail.com
To: Nishanth Thomas ntho...@redhat.com
Cc: Humble Devassy Chirammal humble.deva...@gmail.com, 
gluster-users@gluster.org Gluster-users@gluster.org, Sahina Bose 
sab...@redhat.com
Sent: Wednesday, November 19, 2014 11:46:28 PM
Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)

Thank you!

I think we may need some sort of dampening method and more specific input
into Nagios.  i.e. Details on which files are out-of-sync, versus just the
number of files out-of-sync.

I'm using these:  http://download.gluster.org/pub/gluster/glusterfs-nagios/


On Wed, Nov 19, 2014 at 10:14 AM, Nishanth Thomas ntho...@redhat.com
wrote:

 Hi Vince,

 Are you referring the monitoring scripts mentioned in the blog(
 http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/)
 or the scripts part of the gluster(
 http://gluster.org/pipermail/gluster-users.old/2014-June/017819.html)?
 Please confirm?

 Thanks,
 Nishanth

 - Original Message -
 From: Humble Devassy Chirammal humble.deva...@gmail.com
 To: Vince Loschiavo vloschi...@gmail.com
 Cc: gluster-users@gluster.org Gluster-users@gluster.org, Sahina
 Bose sab...@redhat.com, ntho...@redhat.com
 Sent: Wednesday, November 19, 2014 11:22:18 PM
 Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios
 related)

 Hi Vince,
 It could be a behavioural change in heal process output capture with latest
 GlusterFS. If that is the case, we may tune the interval which  nagios
 collect heal info output  or some other settings to avoid continuous
 alerts. I am Ccing  gluster nagios devs.

 --Humble

 --Humble


 On Wed, Nov 19, 2014 at 9:50 PM, Vince Loschiavo vloschi...@gmail.com
 wrote:

 
  Hello Gluster Community,
 
  I have been using the Nagios monitoring scripts, mentioned in the below
  thread, on 3.5.2 with great success. The most useful of these is the self
  heal.
 
  However, I've just upgraded to 3.6.1 on the lab and the self heal daemon
  has become quite aggressive.  I continually get alerts/warnings on 3.6.1
  that virt disk images need self heal, then they clear.  This is not the
  case on 3.5.2.  This
 
  Configuration:
  2 node, 2 brick replicated volume with 2x1GB LAG network between the
 peers
  using this volume as a QEMU/KVM virt image store through the fuse mount
 on
  Centos 6.5.
 
  Example:
  on 3.5.2:
  *gluster volume heal volumename info:  *shows the bricks and number of
  entries to be healed: 0
 
  On v3.5.2 - During normal gluster operations, I can run this command over
  and over again, 2-4 times per second, and it will always show 0 entries
 to
  be healed.  I've used this as an indicator that the bricks are
  synchronized.
 
  Last night, I upgraded to 3.6.1 in lab and I'm seeing different behavior.
  Running *gluster volume heal volumename info*, during normal operations,
  will show a file out-of-sync, seemingly between every block written to
 disk
  then synced to the peer.  I can run the command over and over again, 2-4
  times per second, and it will almost always show something out of sync.
  The individual files change, meaning:
 
  Example:
  1st Run: shows file1 out of sync
  2nd run: shows file 2 and file 3 out of sync but file 1 is now in sync
  (not in the list)
  3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in sync
  (not in the list).
  ...
  nth run: shows 0 files out of sync
  nth+1 run: shows file 3 and 12 out of sync.
 
  From looking at the virtual machines running off this gluster volume,
 it's
  obvious that gluster is working well.  However, this obviously plays
 havoc
  with Nagios and alerts.  Nagios will run the heal info and get different
  and non-useful results each time, and will send alerts.
 
  Is this behavior change (3.5.2 vs 3.6.1) expected?  Is there a way to
 tune
  the settings or change the monitoring method to get better results into
  Nagios.
 
  Thank you,
 
  --
  -Vince Loschiavo
 
 
  On Wed, Nov 19, 2014 at 4:35 AM, Humble Devassy Chirammal 
  humble.deva...@gmail.com wrote:
 
  Hi Gopu,
 
  Awesome !!
 
  We can  have a Gluster blog about this implementation.
 
  --Humble
 
 
 
  --Humble
 
 
  On Wed, Nov 19, 2014 at 5:38 PM, Gopu Krishnan