subject:"Re\: \[Gluster\-devel\] Spurious failures"

Re: [Gluster-devel] spurious failures for ./tests/basic/afr/root-squash-self-heal.t

2016-09-01 Thread Susant Palai

>From glusterd log:
[2016-08-31 07:54:24.817811] E [run.c:191:runner_log] 
(-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1c30) 
[0x7f1a34ebac30] 
-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1794) 
[0x7f1a34eba794] -->/build/install/lib/libglusterfs.so.0(runner_log+0x1ae) 
[0x7f1a3fa15cea] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=patchy 
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2016-08-31 07:54:24.819166]:++ 
G_LOG:./tests/basic/afr/root-squash-self-heal.t: TEST: 20 1 afr_child_up_status 
patchy 0 ++

The above is spawned from a "volume start force". I checked the brick logs and 
the killed brick had started successfully.

Links to failures:
 https://build.gluster.org/job/centos6-regression/429/console
 https://build.gluster.org/job/netbsd7-regression/358/consoleFull


Thanks,
Susant

- Original Message -
> From: "Susant Palai" 
> To: "gluster-devel" 
> Sent: Thursday, 1 September, 2016 12:13:01 PM
> Subject: [Gluster-devel] spurious failures for
> ./tests/basic/afr/root-squash-self-heal.t
> 
> Hi,
>  $subject is failing spuriously for one of my patch.
> One of the test case is: EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1"
> afr_child_up_status $V0 0
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Raghavendra Gowdappa



- Original Message -
> From: "Poornima Gurusiddaiah" 
> To: "Gluster Devel" , "Manikandan Selvaganesan" 
> , "Susant Palai"
> , "Nithya Balachandran" 
> Sent: Tuesday, March 1, 2016 4:49:51 PM
> Subject: [Gluster-devel] Spurious failures in ec/quota.t and  
> distribute/bug-860663.t
> 
> Hi,
> 
> I see these test cases failing spuriously,
> 
> ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
> ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull

The test which failed is just a umount. Not sure why it failed


# Unmount and remount to make sure we're doing fresh lookups.   


TEST umount $M0


Alternatively we can have another fresh mount on say $M1, and run future tests. 
Can you check whether patch [1] fixes your issue (push your patch as a 
dependency of [1])?

[1] http://review.gluster.org/13567

> 
> Could any one from Quota and dht look into it?
> Regards,
> Poornima
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Poornima Gurusiddaiah

Thank You, have rebased the patch.

Regards,
Poornima

- Original Message -
> From: "Xavier Hernandez" <xhernan...@datalab.es>
> To: "Poornima Gurusiddaiah" <pguru...@redhat.com>, "Gluster Devel" 
> <gluster-devel@gluster.org>, "Manikandan
> Selvaganesan" <mselv...@redhat.com>, "Susant Palai" <spa...@redhat.com>, 
> "Nithya Balachandran" <nbala...@redhat.com>
> Sent: Tuesday, March 1, 2016 4:57:11 PM
> Subject: Re: [Gluster-devel] Spurious failures in ec/quota.t and 
> distribute/bug-860663.t
> 
> Hi Poornima,
> 
> On 01/03/16 12:19, Poornima Gurusiddaiah wrote:
> > Hi,
> >
> > I see these test cases failing spuriously,
> >
> > ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
> 
> This is already solved by http://review.gluster.org/13446/. It has been
> merged just a couple hours ago.
> 
> Xavi
> 
> >
> > ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
> >
> > Could any one from Quota and dht look into it?
> >
> > Regards,
> > Poornima
> >
> >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in ec/quota.t and distribute/bug-860663.t

2016-03-01 Thread Vijaikumar Mallikarjuna

Hi Poornima,

Below patch might solve the regression failure for
''./tests/basic/ec/quota.t'

http://review.gluster.org/#/c/13446/
http://review.gluster.org/#/c/13447/

Thanks,
Vijay


On Tue, Mar 1, 2016 at 4:49 PM, Poornima Gurusiddaiah 
wrote:

> Hi,
>
> I see these test cases failing spuriously,
>
> ./tests/basic/ec/quota.t Failed Tests: 1-13, 16, 18, 20, 2
>
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18637/consoleFull
>
> ./tests/bugs/distribute/bug-860663.t Failed Test: 13
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/18622/consoleFull
>
> Could any one from Quota and dht look into it?
>
> Regards,
> Poornima
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-27 Thread Kotresh Hiremath Ravishankar

Thanks Michael!

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 11:09:52 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 07:59 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > Thank you:) and also please check the script I had given passes in all
> > machines
> 
> So it worked everywhere, but on slave0 and slave1. Not sure what is
> wrong, or if they are used, I will check later.
> 
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 06:50 -0400, Kotresh Hiremath Ravishankar a
écrit :
> >>> Ok, this definitely requires some tests and toughts. It only use ipv4
> >>> too ?
> >>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
>
> Yes!
> 
> Could we know when can these settings be done on all linux slave machines?
> If it takes sometime, we should consider moving all geo-rep testcases 
> under bad tests
> till then.

I will do that this afternoon, now I have a clear idea of what need to
be done.
( I already pushed the path change )

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Kotresh Hiremath Ravishankar

>>> Ok, this definitely requires some tests and toughts. It only use ipv4
>>> too ?
>>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
   
Yes!

Could we know when can these settings be done on all linux slave machines?
If it takes sometime, we should consider moving all geo-rep testcases under 
bad tests
till then.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 1:18:16 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 02:24 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > Hi,
> > 
> > >>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> > I think no, testcases use 'H0' to create volumes
> >  H0=${H0:=`hostname`};
> > Geo-rep expects passwordLess SSH to 'H0'
> >  
> 
> Ok, this definitely requires some tests and toughts. It only use ipv4
> too ?
> (I guess yes, since ipv6 is removed from the rackspace build slaves)
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Kotresh Hiremath Ravishankar

Thank you:) and also please check the script I had given passes in all machines

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Thursday, 24 September, 2015 5:00:43 PM
> Subject: Re: Spurious failures
> 
> Le jeudi 24 septembre 2015 à 06:50 -0400, Kotresh Hiremath Ravishankar a
> écrit :
> > >>> Ok, this definitely requires some tests and toughts. It only use ipv4
> > >>> too ?
> > >>> (I guess yes, since ipv6 is removed from the rackspace build slaves)
> >
> > Yes!
> > 
> > Could we know when can these settings be done on all linux slave
> > machines?
> > If it takes sometime, we should consider moving all geo-rep testcases
> > under bad tests
> > till then.
> 
> I will do that this afternoon, now I have a clear idea of what need to
> be done.
> ( I already pushed the path change )
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Kotresh Hiremath Ravishankar

Hi,

>>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
I think no, testcases use 'H0' to create volumes
 H0=${H0:=`hostname`};
Geo-rep expects passwordLess SSH to 'H0'  
 

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Wednesday, 23 September, 2015 5:05:58 PM
> Subject: Re: Spurious failures
> 
> Le mercredi 23 septembre 2015 à 06:24 -0400, Kotresh Hiremath
> Ravishankar a écrit :
> > Hi Michael,
> > 
> > Please find my replies below.
> > 
> > >>> Root login using password should be disabled, so no. If that's still
> > >>> working and people use it, that's gonna change soon, too much problems
> > >>> with it.
> > 
> >   Ok
> > 
> > >>>Can you be more explicit on where should the user come from so I can
> > >>>properly integrate that ?
> > 
> >   It's just PasswordLess SSH from root to root on to same host.
> >   1. Generate ssh key:
> > #ssh-keygen
> >   2. Add it to /root/.ssh/authorized_keys
> > #ssh-copy-id -i  root@host
> > 
> >   Requirement by geo-replication:
> > 'ssh root@host' should not ask for password
> 
> So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 02:24 -0400, Kotresh Hiremath Ravishankar a
écrit :
> Hi,
> 
> >>>So, it is ok if I restrict that to be used only on 127.0.0.1 ?
> I think no, testcases use 'H0' to create volumes
>  H0=${H0:=`hostname`};
> Geo-rep expects passwordLess SSH to 'H0'  
>  

Ok, this definitely requires some tests and toughts. It only use ipv4
too ?
(I guess yes, since ipv6 is removed from the rackspace build slaves)
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-24 Thread Michael Scherer

Le jeudi 24 septembre 2015 à 07:59 -0400, Kotresh Hiremath Ravishankar a
écrit :
> Thank you:) and also please check the script I had given passes in all 
> machines

So it worked everywhere, but on slave0 and slave1. Not sure what is
wrong, or if they are used, I will check later.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Michael,

Please find my replies below.

>>> Root login using password should be disabled, so no. If that's still
>>> working and people use it, that's gonna change soon, too much problems
>>> with it.

  Ok

>>>Can you be more explicit on where should the user come from so I can
>>>properly integrate that ?

  It's just PasswordLess SSH from root to root on to same host.
  1. Generate ssh key:
#ssh-keygen
  2. Add it to /root/.ssh/authorized_keys
#ssh-copy-id -i  root@host

  Requirement by geo-replication:
'ssh root@host' should not ask for password


>>>There is something adding lots of line to /root/.ssh/authorized_keys on
>>>the slave, and this make me quite unconfortable, so if that's it, I
>>>rather have it done cleanly, and for that, I need to understand the
>>>test, and the requirement.

  Yes, geo-rep is doing it. It adds only once per session. Since the
   test is running continuously for different patches, it's building up.
   I will submit a patch to clean it up in geo-rep testsuite itself.

>>>I will do this one.
  
Thank you!

>>>Is georep supposed to work on other platform like freebsd ? ( because
>>>freebsd do not have bash, so I have to adapt to local way, but if that's
>>>not gonna be tested, I rather not spend too much time on reading the
>>>handbook for now )

As of now it is supported only on Linux, it has known issues with other 
platforms 
such as NetBSD...

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Michael Scherer" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Krutika Dhananjay" , "Atin Mukherjee" 
> , "Gaurav Garg"
> , "Aravinda" , "Gluster Devel" 
> 
> Sent: Wednesday, September 23, 2015 3:30:39 PM
> Subject: Re: Spurious failures
> 
> Le mercredi 23 septembre 2015 à 03:25 -0400, Kotresh Hiremath
> Ravishankar a écrit :
> > Hi Krutika,
> > 
> > Looks like the prerequisites for geo-replication to work is changed
> > in slave21
> > 
> > Hi Michael,
> 
> Hi,
> 
> > Could you please check following settings are made in all linux regression
> > machines?
> 
> Yeah, I will add to salt.
> 
> > Or provide me with root password so that I can verify.
> 
> Root login using password should be disabled, so no. If that's still
> working and people use it, that's gonna change soon, too much problems
> with it.
> 
> > 1. Setup Passwordless SSH for the root user:
> 
> Can you be more explicit on where should the user come from so I can
> properly integrate that ?
> 
> There is something adding lots of line to /root/.ssh/authorized_keys on
> the slave, and this make me quite unconfortable, so if that's it, I
> rather have it done cleanly, and for that, I need to understand the
> test, and the requirement.
>  
> > 2. Add below line in /root/.bashrc. This is required as geo-rep does
> > "gluster --version" via ssh
> >and it can't find the gluster PATH via ssh.
> >  export PATH=$PATH:/build/install/sbin:/build/install/bin
> 
> I will do this one.
> 
> Is georep supposed to work on other platform like freebsd ? ( because
> freebsd do not have bash, so I have to adapt to local way, but if that's
> not gonna be tested, I rather not spend too much time on reading the
> handbook for now )
> 
> > Once above settings are done, the following script should output proper
> > version.
> > 
> > ---
> > #!/bin/bash
> > 
> > function SSHM()
> > {
> > ssh -q \
> > -oPasswordAuthentication=no \
> > -oStrictHostKeyChecking=no \
> > -oControlMaster=yes \
> > "$@";
> > }
> > 
> > function cmd_slave()
> > {
> > local cmd_line;
> > cmd_line=$(cat < > function do_verify() {
> > ver=\$(gluster --version | head -1 | cut -f2 -d " ");
> > echo \$ver;
> > };
> > source /etc/profile && do_verify;
> > EOF
> > );
> > echo $cmd_line;
> > }[root@slave32 ~]
> > 
> > HOST=$1
> > cmd_line=$(cmd_slave);
> > ver=`SSHM root@$HOST bash -c "'$cmd_line'"`;
> > echo $ver
> > -
> > 
> > I could verify for slave32.
> > [root@slave32 ~]# vi /tmp/gver.sh
> > [root@slave32 ~]# /tmp/gver.sh slave32
> > 3.8dev
> > 
> > Please help me in verifying the same for all the linux regression machines.
> > 
> 
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
> 
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Michael Scherer

Le mercredi 23 septembre 2015 à 06:24 -0400, Kotresh Hiremath
Ravishankar a écrit :
> Hi Michael,
> 
> Please find my replies below.
> 
> >>> Root login using password should be disabled, so no. If that's still
> >>> working and people use it, that's gonna change soon, too much problems
> >>> with it.
> 
>   Ok
> 
> >>>Can you be more explicit on where should the user come from so I can
> >>>properly integrate that ?
> 
>   It's just PasswordLess SSH from root to root on to same host.
>   1. Generate ssh key:
> #ssh-keygen
>   2. Add it to /root/.ssh/authorized_keys
> #ssh-copy-id -i  root@host
> 
>   Requirement by geo-replication:
> 'ssh root@host' should not ask for password

So, it is ok if I restrict that to be used only on 127.0.0.1 ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Michael Scherer

Le mercredi 23 septembre 2015 à 03:25 -0400, Kotresh Hiremath
Ravishankar a écrit :
> Hi Krutika,
> 
> Looks like the prerequisites for geo-replication to work is changed
> in slave21
> 
> Hi Michael,

Hi,

> Could you please check following settings are made in all linux regression 
> machines?

Yeah, I will add to salt.

> Or provide me with root password so that I can verify.

Root login using password should be disabled, so no. If that's still
working and people use it, that's gonna change soon, too much problems
with it.

> 1. Setup Passwordless SSH for the root user:

Can you be more explicit on where should the user come from so I can
properly integrate that ?

There is something adding lots of line to /root/.ssh/authorized_keys on
the slave, and this make me quite unconfortable, so if that's it, I
rather have it done cleanly, and for that, I need to understand the
test, and the requirement.
 
> 2. Add below line in /root/.bashrc. This is required as geo-rep does "gluster 
> --version" via ssh
>and it can't find the gluster PATH via ssh.
>  export PATH=$PATH:/build/install/sbin:/build/install/bin

I will do this one.

Is georep supposed to work on other platform like freebsd ? ( because
freebsd do not have bash, so I have to adapt to local way, but if that's
not gonna be tested, I rather not spend too much time on reading the
handbook for now )

> Once above settings are done, the following script should output proper 
> version.
> 
> ---
> #!/bin/bash
> 
> function SSHM()
> {
> ssh -q \
> -oPasswordAuthentication=no \
> -oStrictHostKeyChecking=no \
> -oControlMaster=yes \
> "$@";
> }
> 
> function cmd_slave()
> {
> local cmd_line;
> cmd_line=$(cat < function do_verify() {
> ver=\$(gluster --version | head -1 | cut -f2 -d " ");
> echo \$ver;
> };
> source /etc/profile && do_verify;
> EOF
> );
> echo $cmd_line;
> }[root@slave32 ~]
> 
> HOST=$1
> cmd_line=$(cmd_slave);
> ver=`SSHM root@$HOST bash -c "'$cmd_line'"`;
> echo $ver
> -
> 
> I could verify for slave32. 
> [root@slave32 ~]# vi /tmp/gver.sh 
> [root@slave32 ~]# /tmp/gver.sh slave32
> 3.8dev
> 
> Please help me in verifying the same for all the linux regression machines.
> 

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Krutika,

It's failing with

++ gluster --mode=script --wignore volume geo-rep master 
slave21.cloud.gluster.org::slave create push-pem
Gluster version mismatch between master and slave.

I will look into it.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Krutika Dhananjay" 
> To: "Atin Mukherjee" 
> Cc: "Gluster Devel" , "Gaurav Garg" 
> , "Aravinda" ,
> "Kotresh Hiremath Ravishankar" 
> Sent: Tuesday, September 22, 2015 9:03:44 PM
> Subject: Re: Spurious failures
> 
> Ah! Sorry. I didn't read that line. :)
> 
> Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad
> tests list.
> 
> So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now.
> 
> Thanks Atin!
> 
> -Krutika
> 
> - Original Message -
> 
> > From: "Atin Mukherjee" 
> > To: "Krutika Dhananjay" 
> > Cc: "Gluster Devel" , "Gaurav Garg"
> > , "Aravinda" , "Kotresh Hiremath
> > Ravishankar" 
> > Sent: Tuesday, September 22, 2015 8:51:22 PM
> > Subject: Re: Spurious failures
> 
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> > 0 Tests: 8 Failed: 2)
> > Failed tests: 6, 8
> > Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> > 0.56 csys = 1.46 CPU)
> > Result: FAIL
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> > status 1
> > *Ignoring failure from known-bad test
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> > [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> > 17587 ms
> > [11:24:16]
> > All tests successful
> 
> > On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > >
> > > Ctrl + f 'not ok'.
> > >
> > > -Krutika
> > >
> > > 
> > >
> > > *From: *"Atin Mukherjee" 
> > > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > > 
> > > *Cc: *"Gaurav Garg" , "Aravinda"
> > > , "Kotresh Hiremath Ravishankar"
> > > 
> > > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > > *Subject: *Re: Spurious failures
> > >
> > > Krutika,
> > >
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > > already a part of bad_tests () in both mainline and 3.7. Could you
> > > provide me the link where this test has failed explicitly and that has
> > > caused the regression to fail?
> > >
> > > ~Atin
> > >
> > >
> > > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > > Hi,
> > > >
> > > > The following tests seem to be failing consistently on the build
> > > > machines in Linux:
> > > >
> > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > > >
> > > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > > >
> > > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > > >
> > > > I have added these tests into the tracker etherpad.
> > > >
> > > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > > perhaps move them to bad tests list?
> > > >
> > > >
> > > > Here is one place where the three tests failed:
> > > >
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > >
> > > > -Krutika
> > > >
> > >
> > >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-23 Thread Kotresh Hiremath Ravishankar

Hi Krutika,

Looks like the prerequisites for geo-replication to work is changed
in slave21

Hi Michael,

Could you please check following settings are made in all linux regression 
machines?
Or provide me with root password so that I can verify.

1. Setup Passwordless SSH for the root user:
 
2. Add below line in /root/.bashrc. This is required as geo-rep does "gluster 
--version" via ssh
   and it can't find the gluster PATH via ssh.
 export PATH=$PATH:/build/install/sbin:/build/install/bin

Once above settings are done, the following script should output proper version.

---
#!/bin/bash

function SSHM()
{
ssh -q \
-oPasswordAuthentication=no \
-oStrictHostKeyChecking=no \
-oControlMaster=yes \
"$@";
}

function cmd_slave()
{
local cmd_line;
cmd_line=$(cat < From: "Kotresh Hiremath Ravishankar" 
> To: "Krutika Dhananjay" 
> Cc: "Atin Mukherjee" , "Gluster Devel" 
> , "Gaurav Garg"
> , "Aravinda" 
> Sent: Wednesday, September 23, 2015 12:31:12 PM
> Subject: Re: Spurious failures
> 
> Hi Krutika,
> 
> It's failing with
> 
> ++ gluster --mode=script --wignore volume geo-rep master
> slave21.cloud.gluster.org::slave create push-pem
> Gluster version mismatch between master and slave.
> 
> I will look into it.
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "Krutika Dhananjay" 
> > To: "Atin Mukherjee" 
> > Cc: "Gluster Devel" , "Gaurav Garg"
> > , "Aravinda" ,
> > "Kotresh Hiremath Ravishankar" 
> > Sent: Tuesday, September 22, 2015 9:03:44 PM
> > Subject: Re: Spurious failures
> > 
> > Ah! Sorry. I didn't read that line. :)
> > 
> > Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad
> > tests list.
> > 
> > So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now.
> > 
> > Thanks Atin!
> > 
> > -Krutika
> > 
> > - Original Message -
> > 
> > > From: "Atin Mukherjee" 
> > > To: "Krutika Dhananjay" 
> > > Cc: "Gluster Devel" , "Gaurav Garg"
> > > , "Aravinda" , "Kotresh Hiremath
> > > Ravishankar" 
> > > Sent: Tuesday, September 22, 2015 8:51:22 PM
> > > Subject: Re: Spurious failures
> > 
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> > > 0 Tests: 8 Failed: 2)
> > > Failed tests: 6, 8
> > > Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> > > 0.56 csys = 1.46 CPU)
> > > Result: FAIL
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> > > status 1
> > > *Ignoring failure from known-bad test
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> > > [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> > > 17587 ms
> > > [11:24:16]
> > > All tests successful
> > 
> > > On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > >
> > > > Ctrl + f 'not ok'.
> > > >
> > > > -Krutika
> > > >
> > > > 
> > > >
> > > > *From: *"Atin Mukherjee" 
> > > > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > > > 
> > > > *Cc: *"Gaurav Garg" , "Aravinda"
> > > > , "Kotresh Hiremath Ravishankar"
> > > > 
> > > > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > > > *Subject: *Re: Spurious failures
> > > >
> > > > Krutika,
> > > >
> > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > > > already a part of bad_tests () in both mainline and 3.7. Could you
> > > > provide me the link where this test has failed explicitly and that has
> > > > caused the regression to fail?
> > > >
> > > > ~Atin
> > > >
> > > >
> > > > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > > > Hi,
> > > > >
> > > > > The following tests seem to be failing consistently on the build
> > > > > machines in Linux:
> > > > >
> > > > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > > > >
> > > > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > > > >
> > > > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > > > >
> > > > > I have added these tests into the tracker etherpad.
> > > > >
> > > > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > > > perhaps move them to bad tests list?
> > > > >
> > > > >
> > > > > Here is one place where the three tests failed:
> > > > >
> > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > > >

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Atin Mukherjee

Krutika,

./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
already a part of bad_tests () in both mainline and 3.7. Could you
provide me the link where this test has failed explicitly and that has
caused the regression to fail?

~Atin


On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> Hi,
> 
> The following tests seem to be failing consistently on the build
> machines in Linux:
> 
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> 
> ./tests/geo-rep/georep-basic-dr-rsync.t ..
> 
> ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> 
> I have added these tests into the tracker etherpad.
> 
> Meanwhile could someone from geo-rep and glusterd team take a look or
> perhaps move them to bad tests list?
> 
> 
> Here is one place where the three tests failed:
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> 
> -Krutika
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Krutika Dhananjay

https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
 

Ctrl + f 'not ok'. 

-Krutika 

- Original Message -

> From: "Atin Mukherjee" 
> To: "Krutika Dhananjay" , "Gluster Devel"
> 
> Cc: "Gaurav Garg" , "Aravinda" ,
> "Kotresh Hiremath Ravishankar" 
> Sent: Tuesday, September 22, 2015 8:39:56 PM
> Subject: Re: Spurious failures

> Krutika,

> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> already a part of bad_tests () in both mainline and 3.7. Could you
> provide me the link where this test has failed explicitly and that has
> caused the regression to fail?

> ~Atin

> On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > Hi,
> >
> > The following tests seem to be failing consistently on the build
> > machines in Linux:
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> >
> > I have added these tests into the tracker etherpad.
> >
> > Meanwhile could someone from geo-rep and glusterd team take a look or
> > perhaps move them to bad tests list?
> >
> >
> > Here is one place where the three tests failed:
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > -Krutika
> >
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Atin Mukherjee

./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
0 Tests: 8 Failed: 2)
  Failed tests:  6, 8
Files=1, Tests=8, 48 wallclock secs ( 0.01 usr  0.01 sys +  0.88 cusr
0.56 csys =  1.46 CPU)
Result: FAIL
./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
status 1
*Ignoring failure from known-bad test
./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
[11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
17587 ms
[11:24:16]
All tests successful

On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> 
> Ctrl + f 'not ok'.
> 
> -Krutika
> 
> 
> 
> *From: *"Atin Mukherjee" 
> *To: *"Krutika Dhananjay" , "Gluster Devel"
> 
> *Cc: *"Gaurav Garg" , "Aravinda"
> , "Kotresh Hiremath Ravishankar"
> 
> *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> *Subject: *Re: Spurious failures
> 
> Krutika,
> 
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> already a part of bad_tests () in both mainline and 3.7. Could you
> provide me the link where this test has failed explicitly and that has
> caused the regression to fail?
> 
> ~Atin
> 
> 
> On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > Hi,
> >
> > The following tests seem to be failing consistently on the build
> > machines in Linux:
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> >
> > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> >
> > I have added these tests into the tracker etherpad.
> >
> > Meanwhile could someone from geo-rep and glusterd team take a look or
> > perhaps move them to bad tests list?
> >
> >
> > Here is one place where the three tests failed:
> >
> 
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > -Krutika
> >
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures

2015-09-22 Thread Krutika Dhananjay

Ah! Sorry. I didn't read that line. :) 

Just figured even ./tests/geo-rep/georep-basic-dr-rsync.t is added to bad tests 
list. 

So it's just /tests/geo-rep/georep-basic-dr-tarssh.t for now. 

Thanks Atin! 

-Krutika 

- Original Message -

> From: "Atin Mukherjee" 
> To: "Krutika Dhananjay" 
> Cc: "Gluster Devel" , "Gaurav Garg"
> , "Aravinda" , "Kotresh Hiremath
> Ravishankar" 
> Sent: Tuesday, September 22, 2015 8:51:22 PM
> Subject: Re: Spurious failures

> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t (Wstat:
> 0 Tests: 8 Failed: 2)
> Failed tests: 6, 8
> Files=1, Tests=8, 48 wallclock secs ( 0.01 usr 0.01 sys + 0.88 cusr
> 0.56 csys = 1.46 CPU)
> Result: FAIL
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t: bad
> status 1
> *Ignoring failure from known-bad test
> ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t*
> [11:24:16] ./tests/bugs/glusterd/bug-1242543-replace-brick.t .. ok
> 17587 ms
> [11:24:16]
> All tests successful

> On 09/22/2015 08:46 PM, Krutika Dhananjay wrote:
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> >
> > Ctrl + f 'not ok'.
> >
> > -Krutika
> >
> > 
> >
> > *From: *"Atin Mukherjee" 
> > *To: *"Krutika Dhananjay" , "Gluster Devel"
> > 
> > *Cc: *"Gaurav Garg" , "Aravinda"
> > , "Kotresh Hiremath Ravishankar"
> > 
> > *Sent: *Tuesday, September 22, 2015 8:39:56 PM
> > *Subject: *Re: Spurious failures
> >
> > Krutika,
> >
> > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t is
> > already a part of bad_tests () in both mainline and 3.7. Could you
> > provide me the link where this test has failed explicitly and that has
> > caused the regression to fail?
> >
> > ~Atin
> >
> >
> > On 09/22/2015 07:27 PM, Krutika Dhananjay wrote:
> > > Hi,
> > >
> > > The following tests seem to be failing consistently on the build
> > > machines in Linux:
> > >
> > > ./tests/bugs/glusterd/bug-1238706-daemons-stop-on-peer-cleanup.t ..
> > >
> > > ./tests/geo-rep/georep-basic-dr-rsync.t ..
> > >
> > > ./tests/geo-rep/georep-basic-dr-tarssh.t ..
> > >
> > > I have added these tests into the tracker etherpad.
> > >
> > > Meanwhile could someone from geo-rep and glusterd team take a look or
> > > perhaps move them to bad tests list?
> > >
> > >
> > > Here is one place where the three tests failed:
> > >
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/14421/consoleFull
> > >
> > > -Krutika
> > >
> >
> >
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-20 Thread Ravishankar N




On 07/20/2015 12:45 PM, Niels de Vos wrote:

On Mon, Jul 20, 2015 at 09:25:15AM +0530, Ravishankar N wrote:

I'll take a look.

Thanks. I'm actually not sure if this is a arbiter.t issue, maybe I
blamed it too early? Its the first test that gets executed, and no
others are tried after it failed.

Niels



Regards,
Ravi

On 07/20/2015 03:07 AM, Niels de Vos wrote:

I have seen several occurences of failures in arbiter.t now. This is one
of the errors:

 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull

 [21:20:20] ./tests/basic/afr/arbiter.t ..
 not ok 7 Got N instead of Y
 not ok 15
 not ok 16 Got  instead of 1
 not ok 23 Got  instead of 1
 not ok 25 Got 0 when not expecting it
 not ok 26
 not ok 34 Got 0 instead of 1
 not ok 35 Got 0 instead of 1
 not ok 41 Got  instead of 1
 not ok 47 Got N instead of Y
 Failed 10/47 subtests
 [21:20:20]
 Test Summary Report
 ---
 ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
   Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47








So the test #7 that failed is 16 EXPECT_WITHIN $UMOUNT_TIMEOUT Y 
force_umount $M0
Looking at mnt-glusterfs-0.log, I see that the unmount has already 
happened before the actual command was run, at least from the time stamp 
logged by G_LOG() function.


[2015-07-19 21:16:21.784293] I [fuse-bridge.c:4946:fuse_thread_proc] 
0-fuse: unmounting /mnt/glusterfs/0
[2015-07-19 21:16:21.784542] W [glusterfsd.c:1214:cleanup_and_exit] 
(--/lib64/libpthread.so.0(+0x79d1) [0x7fc3f41c49d1] 
--glusterfs(glusterfs_sigwaiter+0xe4) [0x409734] 
--glusterfs(cleanup_and_exit+0x87) [0x407ba7] ) 0-: received signum 
(15), shutting down
[2015-07-19 21:16:21.784571] I [fuse-bridge.c:5645:fini] 0-fuse: 
Unmounting '/mnt/glusterfs/0'.
[2015-07-19 21:16:21.785817332]:++ 
G_LOG:./tests/basic/afr/arbiter.t: TEST: 15 ! stat 
/mnt/glusterfs/0/.meta/graphs/active/patchy-replicate-0/options/arbiter-count 
++
[2015-07-19 21:16:21.796574975]:++ 
G_LOG:./tests/basic/afr/arbiter.t: TEST: 16 Y force_umount 
/mnt/glusterfs/0 ++


I have no clue as to why that could have happened because appending to 
the gluster log files using G_LOG() is done *before* the test is 
executed.In all my trial runs, the G_LOG message gets logged first, 
followed by the logs relevant to the actual command being run.



FWIW, http://review.gluster.org/#/c/4/ changed made the following 
change to arbiter.t  amongst other test cases :


-TEST umount $M0
+EXPECT_WITHIN $UMOUNT_TIMEOUT Y force_umount $M0

But I'm not sure doing a umount -f has any impact for fuse mounts.

Regards,
Ravi


 Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr  
3.48 csys = 18.74 CPU)
 Result: FAIL


Who could have look at this?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-20 Thread Niels de Vos

On Mon, Jul 20, 2015 at 09:25:15AM +0530, Ravishankar N wrote:
 I'll take a look.

Thanks. I'm actually not sure if this is a arbiter.t issue, maybe I
blamed it too early? Its the first test that gets executed, and no
others are tried after it failed.

Niels


 Regards,
 Ravi
 
 On 07/20/2015 03:07 AM, Niels de Vos wrote:
 I have seen several occurences of failures in arbiter.t now. This is one
 of the errors:
 
  
  https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull
 
  [21:20:20] ./tests/basic/afr/arbiter.t ..
  not ok 7 Got N instead of Y
  not ok 15
  not ok 16 Got  instead of 1
  not ok 23 Got  instead of 1
  not ok 25 Got 0 when not expecting it
  not ok 26
  not ok 34 Got 0 instead of 1
  not ok 35 Got 0 instead of 1
  not ok 41 Got  instead of 1
  not ok 47 Got N instead of Y
  Failed 10/47 subtests
  [21:20:20]
  Test Summary Report
  ---
  ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47
  Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr 
   3.48 csys = 18.74 CPU)
  Result: FAIL
 
 
 Who could have look at this?
 
 Thanks,
 Niels
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures in tests/basic/afr/arbiter.t

2015-07-19 Thread Ravishankar N


I'll take a look.
Regards,
Ravi

On 07/20/2015 03:07 AM, Niels de Vos wrote:

I have seen several occurences of failures in arbiter.t now. This is one
of the errors:

 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/12626/consoleFull

 [21:20:20] ./tests/basic/afr/arbiter.t ..
 not ok 7 Got N instead of Y
 not ok 15
 not ok 16 Got  instead of 1
 not ok 23 Got  instead of 1
 not ok 25 Got 0 when not expecting it
 not ok 26
 not ok 34 Got 0 instead of 1
 not ok 35 Got 0 instead of 1
 not ok 41 Got  instead of 1
 not ok 47 Got N instead of Y
 Failed 10/47 subtests
 [21:20:20]
 
 Test Summary Report

 ---
 ./tests/basic/afr/arbiter.t (Wstat: 0 Tests: 47 Failed: 10)
   Failed tests:  7, 15-16, 23, 25-26, 34-35, 41, 47
 Files=1, Tests=47, 243 wallclock secs ( 0.04 usr  0.00 sys + 15.22 cusr  
3.48 csys = 18.74 CPU)
 Result: FAIL


Who could have look at this?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-09 Thread Jeff Darcy

Sad but true.  More tests are failing than passing, and the failures are
often *clearly* unrelated to the patches they're supposedly testing.
Let's revive the Etherpad, and use it to track progress as we clean this
up.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Kaushal M

This doesn't seem to have been fixed completely. My change [1] failed
(again !) on this test [2], even after rebasing onto the fix [3].

[1]: https://review.gluster.org/11559
[2]: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12152/consoleFull
[3]: https://review.gluster.org/11579

On Thu, Jul 9, 2015 at 4:20 PM, Pranith Kumar Karampuri
pkara...@redhat.com wrote:
 Sorry, seems like this is already fixed, I just need to rebase.

 Pranith


 On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote:

 hi,
   Could you please look into
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull

 Pranith
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel


 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Pranith Kumar Karampuri


Sorry, seems like this is already fixed, I just need to rebase.

Pranith

On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote:

hi,
  Could you please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull 



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Ravishankar N




On 07/08/2015 03:57 PM, Anuradha Talur wrote:


- Original Message -

From: Kaushal M kshlms...@gmail.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Wednesday, July 8, 2015 3:42:12 PM
Subject: [Gluster-devel] Spurious failures again

I've been hitting spurious failures in Linux regression runs for my change
[1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
./tests/bugs/quota/bug-1235182.t [5]
./tests/bugs/replicate/bug-977797.t [6]



Ran ./tests/bugs/replicate/bug-977797.t  multiple times in a loop, no 
failure observed. The logs in [6] seem inaccessible  as well.




Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull

Will look into this.

[3]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull

For 3rd one the patch needs to be rebased. Ravi sent a fix 
http://review.gluster.org/#/c/11556/ .

[4]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M




On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

I've been hitting spurious failures in Linux regression runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Anuradha Talur

- Original Message -
 From: Kaushal M kshlms...@gmail.com
 To: Gluster Devel gluster-devel@gluster.org
 Sent: Wednesday, July 8, 2015 3:42:12 PM
 Subject: [Gluster-devel] Spurious failures again

 I've been hitting spurious failures in Linux regression runs for my change
 [1].

 The following tests failed,
 ./tests/basic/afr/replace-brick-self-heal.t [2]
 ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
 ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]
 ./tests/bugs/quota/bug-1235182.t [5]
 ./tests/bugs/replicate/bug-977797.t [6]

 Can AFR and quota owners look into this?

 Thanks.

 Kaushal

 [1] https://review.gluster.org/11559
 [2]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull

Will look into this.
 [3]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
For 3rd one the patch needs to be rebased. Ravi sent a fix 
http://review.gluster.org/#/c/11556/ .
 [4]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
 [5]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
 [6]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Thanks,
Anuradha.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M




On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:
I've been hitting spurious failures in Linux regression runs for my 
change [1].


The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

Patch submitted: http://review.gluster.org/#/c/11583/




./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Atin Mukherjee

I think our linux regression is again unstable. I am seeing at least 10
such test cases ( if not more) which have failed. I think we should again
start maintaining an etherpad page (probably the same earlier one) and keep
track of them otherwise it will be difficult to track what is fixed and
what's not if we have to go through mails.

Thoughts?

-Atin
Sent from one plus one
On Jul 8, 2015 8:45 PM, Vijaikumar M vmall...@redhat.com wrote:



 On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



 On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

 I've been hitting spurious failures in Linux regression runs for my
 change [1].

 The following tests failed,
 ./tests/basic/afr/replace-brick-self-heal.t [2]
 ./tests/bugs/replicate/bug-1238508-self-heal.t [3]
 ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

 I will look into this issue

 Patch submitted: http://review.gluster.org/#/c/11583/



  ./tests/bugs/quota/bug-1235182.t [5]

 I have submitted two patches to fix failures from 'bug-1235182.t'
 http://review.gluster.org/#/c/11561/
 http://review.gluster.org/#/c/11510/

  ./tests/bugs/replicate/bug-977797.t [6]

 Can AFR and quota owners look into this?

 Thanks.

 Kaushal

 [1] https://review.gluster.org/11559
 [2]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
 [3]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
 [4]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
 [5]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
 [6]
 http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull



 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Ravishankar N




On 07/08/2015 11:16 PM, Atin Mukherjee wrote:


I think our linux regression is again unstable. I am seeing at least 
10 such test cases ( if not more) which have failed. I think we should 
again start maintaining an etherpad page (probably the same earlier 
one) and keep track of them otherwise it will be difficult to track 
what is fixed and what's not if we have to go through mails.


Thoughts?




Makes sense. The link is here 
https://public.pad.fsfe.org/p/gluster-spurious-failures

Perhaps we should remove the entries and start fresh.

-Ravi


-Atin
Sent from one plus one

On Jul 8, 2015 8:45 PM, Vijaikumar M vmall...@redhat.com 
mailto:vmall...@redhat.com wrote:




On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

I've been hitting spurious failures in Linux regression
runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

Patch submitted: http://review.gluster.org/#/c/11583/



./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/

./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull



___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-03 Thread Krishnan Parthasarathi


  This approach could still surprise the storage-admin when glusterfs(d)
  processes
  bind to ports in the range where brick ports are being assigned. We should
  make this
  predictable by reserving brick ports setting
  net.ipv4.ip_local_reserved_ports.
  Initially reserve 50 ports starting at 49152. Subsequently, we could
  reserve ports on demand,
  say 50 more ports, when we exhaust previously reserved range.
  net.ipv4.ip_local_reserved_ports
  doesn't interfere with explicit port allocation behaviour. i.e if the
  socket uses
  a port other than zero. With this option we don't have to manage ports
  assignment at a process
  level. Thoughts?
 If the reallocation can be done on demand, I do think this is a better
 approach to tackle this problem.

We could fix the predictability aspect in a different patch. This patch, where
we assign ports starting from 65335 in descending order, can be reviewed 
independently.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-03 Thread Krishnan Parthasarathi

 
 It seems this is exactly whats happening.
 
 I have a question, I get the following data from netstat and grep
 
 tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
  ESTABLISHED 31516/glusterfsd
 tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
  ESTABLISHED 31516/glusterfsd
 unix  3  [ ] STREAM CONNECTED 988353   31516/glusterfsd
 /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
 
 Here 31516 is the brick pid.
 
 Looking at the data, line 2 is very clear, it shows connection between
 brick and glusterfs client.
 unix socket on line 3 is also clear, it is the unix socket connection that
 glusterd and brick process use for communication.
 
 I am not able to understand line 1; which part of brick process established
 a tcp connection with glusterd using port 1023?

This is the rpc connection from any glusterfs(d) process to glusterd to fetch
volfile on receiving notification from glusterd.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-03 Thread Krishnan Parthasarathi



- Original Message -
 
 This is caused because when bind-insecure is turned on (which is the default
 now), it may happen
 that brick is not able to bind to port assigned by Glusterd for example
 49192-49195...
 It seems to occur because the rpc_clnt connections are binding to ports in
 the same range.
 so brick fails to bind to a port which is already used by someone else.
 
 This bug already exist before http://review.gluster.org/#/c/11039/ when use
 rdma, i.e. even
 previously rdma binds to port = 1024 if it cannot find a free port  1024,
 even when bind insecure was turned off (ref to commit '0e3fd04e').
 Since we don't have tests related to rdma we did not discover this issue
 previously.
 
 http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
 however now the bug can be fixed by
 http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
 from 65535 in a descending
 order, as a result port clash is minimized, also it fixes issues in rdma too

This approach could still surprise the storage-admin when glusterfs(d) processes
bind to ports in the range where brick ports are being assigned. We should make 
this
predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
ports on demand,
say 50 more ports, when we exhaust previously reserved range. 
net.ipv4.ip_local_reserved_ports
doesn't interfere with explicit port allocation behaviour. i.e if the socket 
uses
a port other than zero. With this option we don't have to manage ports 
assignment at a process
level. Thoughts?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-03 Thread Atin Mukherjee



On 07/03/2015 11:58 AM, Krishnan Parthasarathi wrote:
 
 
 - Original Message -

 This is caused because when bind-insecure is turned on (which is the default
 now), it may happen
 that brick is not able to bind to port assigned by Glusterd for example
 49192-49195...
 It seems to occur because the rpc_clnt connections are binding to ports in
 the same range.
 so brick fails to bind to a port which is already used by someone else.

 This bug already exist before http://review.gluster.org/#/c/11039/ when use
 rdma, i.e. even
 previously rdma binds to port = 1024 if it cannot find a free port  1024,
 even when bind insecure was turned off (ref to commit '0e3fd04e').
 Since we don't have tests related to rdma we did not discover this issue
 previously.

 http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
 however now the bug can be fixed by
 http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
 from 65535 in a descending
 order, as a result port clash is minimized, also it fixes issues in rdma too
 
 This approach could still surprise the storage-admin when glusterfs(d) 
 processes
 bind to ports in the range where brick ports are being assigned. We should 
 make this
 predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
 Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
 ports on demand,
 say 50 more ports, when we exhaust previously reserved range. 
 net.ipv4.ip_local_reserved_ports
 doesn't interfere with explicit port allocation behaviour. i.e if the socket 
 uses
 a port other than zero. With this option we don't have to manage ports 
 assignment at a process
 level. Thoughts?
If the reallocation can be done on demand, I do think this is a better
approach to tackle this problem.
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 

-- 
~Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Prasanna Kalever


This is caused because when bind-insecure is turned on (which is the default 
now), it may happen
that brick is not able to bind to port assigned by Glusterd for example 
49192-49195...
It seems to occur because the rpc_clnt connections are binding to ports in the 
same range. 
so brick fails to bind to a port which is already used by someone else.

This bug already exist before http://review.gluster.org/#/c/11039/ when use 
rdma, i.e. even
previously rdma binds to port = 1024 if it cannot find a free port  1024,
even when bind insecure was turned off (ref to commit '0e3fd04e').
Since we don't have tests related to rdma we did not discover this issue 
previously.

http://review.gluster.org/#/c/11039/ discovers the bug we encountered, however 
now the bug can be fixed by
http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers 
from 65535 in a descending
order, as a result port clash is minimized, also it fixes issues in rdma too

Thanks to Raghavendra Talur for help in discovering the real cause


Regards,
Prasanna Kalever



- Original Message -
From: Raghavendra Talur raghavendra.ta...@gmail.com
To: Krishnan Parthasarathi kpart...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, July 2, 2015 6:45:17 PM
Subject: Re: [Gluster-devel] spurious failures  
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t



On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur  raghavendra.ta...@gmail.com 
 wrote: 





On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi  kpart...@redhat.com  
wrote: 



  
  A port assigned by Glusterd for a brick is found to be in use already by 
  the brick. Any changes in Glusterd recently which can cause this? 
  
  Or is it a test infra problem? 

This issue is likely to be caused by http://review.gluster.org/11039 
This patch changes the port allocation that happens for rpc_clnt based 
connections. Previously, ports allocated where  1024. With this change, 
these connections, typically mount process, gluster-nfs server processes 
etc could end up using ports that bricks are being assigned to. 

IIUC, the intention of the patch was to make server processes lenient to 
inbound messages from ports  1024. If we don't require to use ports  1024 
we could leave the port allocation for rpc_clnt connections as before. 
Alternately, we could reserve the range of ports starting from 49152 for bricks 
by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific 
to Linux. 
I'm not aware of how this could be done in NetBSD for instance though. 


It seems this is exactly whats happening. 

I have a question, I get the following data from netstat and grep 

tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd 
tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd 
unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd 
/var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket 

Here 31516 is the brick pid. 

Looking at the data, line 2 is very clear, it shows connection between brick 
and glusterfs client. 
unix socket on line 3 is also clear, it is the unix socket connection that 
glusterd and brick process use for communication. 

I am not able to understand line 1; which part of brick process established a 
tcp connection with glusterd using port 1023? 
Note: this data is from a build which does not have the above mentioned patch. 


The patch which exposed this bug is being reverted till the underlying bug is 
also fixed. 
You can monitor revert patches here 
master: http://review.gluster.org/11507 
3.7 branch: http://review.gluster.org/11508 

Please rebase your patches after the above patches are merged to ensure that 
you patches pass regression. 





-- 
Raghavendra Talur 




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Atin Mukherjee

Thanks Prasanna for the patches :)

-Atin
Sent from one plus one
On Jul 2, 2015 9:19 PM, Prasanna Kalever pkale...@redhat.com wrote:


 This is caused because when bind-insecure is turned on (which is the
 default now), it may happen
 that brick is not able to bind to port assigned by Glusterd for example
 49192-49195...
 It seems to occur because the rpc_clnt connections are binding to ports in
 the same range.
 so brick fails to bind to a port which is already used by someone else.

 This bug already exist before http://review.gluster.org/#/c/11039/ when
 use rdma, i.e. even
 previously rdma binds to port = 1024 if it cannot find a free port  1024,
 even when bind insecure was turned off (ref to commit '0e3fd04e').
 Since we don't have tests related to rdma we did not discover this issue
 previously.

 http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
 however now the bug can be fixed by
 http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port
 numbers from 65535 in a descending
 order, as a result port clash is minimized, also it fixes issues in rdma
 too

 Thanks to Raghavendra Talur for help in discovering the real cause


 Regards,
 Prasanna Kalever



 - Original Message -
 From: Raghavendra Talur raghavendra.ta...@gmail.com
 To: Krishnan Parthasarathi kpart...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Thursday, July 2, 2015 6:45:17 PM
 Subject: Re: [Gluster-devel] spurious failures
 tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t



 On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur 
 raghavendra.ta...@gmail.com  wrote:





 On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi 
 kpart...@redhat.com  wrote:



  
   A port assigned by Glusterd for a brick is found to be in use already
 by
   the brick. Any changes in Glusterd recently which can cause this?
  
   Or is it a test infra problem?

 This issue is likely to be caused by http://review.gluster.org/11039
 This patch changes the port allocation that happens for rpc_clnt based
 connections. Previously, ports allocated where  1024. With this change,
 these connections, typically mount process, gluster-nfs server processes
 etc could end up using ports that bricks are being assigned to.

 IIUC, the intention of the patch was to make server processes lenient to
 inbound messages from ports  1024. If we don't require to use ports  1024
 we could leave the port allocation for rpc_clnt connections as before.
 Alternately, we could reserve the range of ports starting from 49152 for
 bricks
 by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
 specific to Linux.
 I'm not aware of how this could be done in NetBSD for instance though.


 It seems this is exactly whats happening.

 I have a question, I get the following data from netstat and grep

 tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd
 tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd
 unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd
 /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket

 Here 31516 is the brick pid.

 Looking at the data, line 2 is very clear, it shows connection between
 brick and glusterfs client.
 unix socket on line 3 is also clear, it is the unix socket connection that
 glusterd and brick process use for communication.

 I am not able to understand line 1; which part of brick process
 established a tcp connection with glusterd using port 1023?
 Note: this data is from a build which does not have the above mentioned
 patch.


 The patch which exposed this bug is being reverted till the underlying bug
 is also fixed.
 You can monitor revert patches here
 master: http://review.gluster.org/11507
 3.7 branch: http://review.gluster.org/11508

 Please rebase your patches after the above patches are merged to ensure
 that you patches pass regression.





 --
 Raghavendra Talur




 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Joseph Fernandes

Yep will have a look 

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Joseph Fernandes josfe...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Wednesday, July 1, 2015 1:44:44 PM
Subject: spurious failures 
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
 
has the logs. Could you please look into it.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Raghavendra Talur

:glusterfs_graph_init] 0-patchy-server: initializing translator
failed
 [2015-07-01 07:33:25.069808] E [MSGID: 101176]
[graph.c:669:glusterfs_graph_activate] 0-graph: init failed
 [2015-07-01 07:33:25.070183] W [glusterfsd.c:1214:cleanup_and_exit] (--
0-: received signum (0), shutting down


 Looks like it is assigned a port which is already in used.


 Saw the same error in another test failing for another patch set.
 Here is the link:
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11740/consoleFull

 A port assigned by Glusterd for a brick is found to be in use already by
the brick. Any changes in Glusterd recently which can cause this?

 Or is it a test infra problem?

Prasanna is looking into this for now.





 The status of the volume in glusterd is not started, as a result
attach-tier command fails, i.e tiering rebalancer cannot run.

 [2015-07-01 07:33:25.275092] E [MSGID: 106301]
[glusterd-op-sm.c:4086:glusterd_op_ac_send_stage_op] 0-management: Staging
of operation 'Volume Rebalance' failed on localhost : Volume patchy needs
to be started to perform rebalance

 but the volume is running in the crippled mode, as a result mount works
fine.

 i.e TEST $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0; works fine

 TEST 9-12 failed as attach has failed.


 Regards,
 Joe

 - Original Message -
 From: Joseph Fernandes josfe...@redhat.com
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Wednesday, July 1, 2015 1:59:41 PM
 Subject: Re: [Gluster-devel] spurious failures
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

 Yep will have a look

 - Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Joseph Fernandes josfe...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
 Sent: Wednesday, July 1, 2015 1:44:44 PM
 Subject: spurious failures
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

 hi,

http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull
 has the logs. Could you please look into it.

 Pranith
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel




 --
 Raghavendra Talur

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures? (master)

2015-06-04 Thread Pranith Kumar Karampuri




On 06/05/2015 02:12 AM, Shyam wrote:

Just checking,

This review request: http://review.gluster.org/#/c/11073/

Failed in the following tests:

1) Linux
[20:20:16] ./tests/bugs/replicate/bug-880898.t ..
not ok 4
This seems to be same RC as in self-heald.t where heal info is not 
failing sometimes when the brick is down.

Failed 1/4 subtests
[20:20:16]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10088/consoleFull 



2) NetBSD (Du seems to have faced the same)
[11:56:45] ./tests/basic/afr/sparse-file-self-heal.t ..
not ok 52 Got  instead of 1
not ok 53 Got  instead of 1
not ok 54
not ok 55 Got 2 instead of 0
not ok 56 Got d41d8cd98f00b204e9800998ecf8427e instead of 
b6d81b360a5672d80c27430f39153e2c

not ok 60 Got 0 instead of 1
Failed 6/64 subtests
[11:56:45]

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6233/consoleFull 

There is a bug in statedump code path, If it races with STACK_RESET then 
shd seems to crash. I see the following output indicating the process died.


kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill 
-l [sigspec]



I have not done any analysis, and also the change request should not 
affect the paths that this test is failing on.


Checking the logs for Linux did not throw any more light on the cause, 
although the brick logs are not updated(?) to reflect the volume 
create and start as per the TC in (1).


Anyone know anything (more) about this?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Avra Sengupta


Hi,

As already discussed, if you encounter this or any other snapshot tests, 
it would be great to provide the regression run instance so that we can 
have a look at the logs if there are any. Also I tried running the test 
in a loop as you suggested. After an hour and a half I stopped it so 
that I can use my machines to work on some patches. So please let us 
know when this or any snapshot tests fails for anyone and we will look 
into it asap.


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri



On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a half 
I stopped it so that I can use my machines to work on some patches. So 
please let us know when this or any snapshot tests fails for anyone 
and we will look into it asap.

Please read the mail again to find the link which has the logs.

./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
  Failed tests:  36, 38, 40



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Avra Sengupta


On 05/05/2015 10:43 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a half 
I stopped it so that I can use my machines to work on some patches. 
So please let us know when this or any snapshot tests fails for 
anyone and we will look into it asap.

Please read the mail again to find the link which has the logs.
./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
   Failed tests:  36, 38, 40
As repeatedly told, older regression run doesn't have the logs any more. 
Please find the link and try and fetch the logs. Please tell me if I am 
missing something here.


[root@VM1 lab]# wget 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz .
--2015-05-05 10:47:18-- 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz

Resolving slave33.cloud.gluster.org... 104.130.217.7
Connecting to slave33.cloud.gluster.org|104.130.217.7|:80... failed: 
Connection refused.

--2015-05-05 10:47:19--  http://./
Resolving  failed: No address associated with hostname.
wget: unable to resolve host address “.”
[root@VM1 lab]#

Regards,
Avra



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith






___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri



On 05/05/2015 10:48 AM, Avra Sengupta wrote:

On 05/05/2015 10:43 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a 
half I stopped it so that I can use my machines to work on some 
patches. So please let us know when this or any snapshot tests fails 
for anyone and we will look into it asap.

Please read the mail again to find the link which has the logs.
./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
   Failed tests:  36, 38, 40
As repeatedly told, older regression run doesn't have the logs any 
more. Please find the link and try and fetch the logs. Please tell me 
if I am missing something here.


[root@VM1 lab]# wget 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz 
.
--2015-05-05 10:47:18-- 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz

Resolving slave33.cloud.gluster.org... 104.130.217.7
Connecting to slave33.cloud.gluster.org|104.130.217.7|:80... failed: 
Connection refused.

--2015-05-05 10:47:19-- http://./
Resolving  failed: No address associated with hostname.
wget: unable to resolve host address “.”
[root@VM1 lab]#

Ah! my bad, will let you know if it happens again.

Pranith


Regards,
Avra



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-02 Thread Pranith Kumar Karampuri



On 05/02/2015 10:14 AM, Krishnan Parthasarathi wrote:

If glusterd itself fails to come up, of course the test will fail :-). Is it
still happening?

Pranith,

Did you get a chance to see glusterd logs and find why glusterd didn't come up?
Please paste the relevant logs in this thread.

No :-(. The etherpad doesn't have any links :-(.
Justin any help here?

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Vijay Bellur


On 05/02/2015 08:17 AM, Pranith Kumar Karampuri wrote:

hi,
  As per the etherpad:
https://public.pad.fsfe.org/p/gluster-spurious-failures

  * tests/basic/afr/sparse-file-self-heal.t (Wstat: 0 Tests: 64 Failed: 35)

  * Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  * Happens in master (Mon 30th March - git commit id
3feaf1648528ff39e23748ac9004a77595460c9d)

  * (hasn't yet been added to BZs)

If glusterd itself fails to come up, of course the test will fail :-).
Is it still happening?



We have not been actively curating this list for the last few days and 
am not certain if this failure happens anymore.


Investigating why a regression run fails for our patches and fixing them 
(though unrelated to our patch) should be the most effective way going 
ahead.


-Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Krishnan Parthasarathi


 If glusterd itself fails to come up, of course the test will fail :-). Is it
 still happening?
Pranith,

Did you get a chance to see glusterd logs and find why glusterd didn't come up?
Please paste the relevant logs in this thread.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious Failures in regression runs

2015-03-31 Thread Nithya Balachandran

I'll take a look at the hangs.

Regards,
Nithya

- Original Message -
From: Justin Clift jus...@gluster.org
To: Vijay Bellur vbel...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org, Nithya Balachandran 
nbala...@redhat.com
Sent: Tuesday, 31 March, 2015 5:40:29 AM
Subject: Re: [Gluster-devel] Spurious Failures in regression runs

On 30 Mar 2015, at 18:54, Vijay Bellur vbel...@redhat.com wrote:
 Hi All,

 We are attempting to capture all known spurious regression failures from the 
 jenkins instance in build.gluster.org at [1].
 The issues listed in the etherpad impede our patch merging workflow and need 
 to be sorted out before we branch
 release-3.7. If you happen to be the owner of one or more issues in the 
 etherpad, can you please look into the failures and
 have them addressed soon?

To help show up more regression failures, we ran 20x new VM's
in Rackspace with a full regression test each of master head
branch:

 * Two hung regression tests on tests/bugs/posix/bug-1113960.t
   * Still hung in case anyone wants to check them out
 * 162.242.167.96
 * 162.242.167.132
 * Both allowing remote root login, and using our jenkins
   slave password as their root pw

* 2 x failures on ./tests/basic/afr/sparse-file-self-heal.t
  Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  Added to etherpad

* 1 x failure on ./tests/bugs/disperse/bug-1187474.t
  Failed tests:  11-12

  Added to etherpad

* 1 x failure on ./tests/basic/uss.t
  Failed test:  153

  Already on etherpad

Looks like our general failure rate is improving. :)  The hangs
are a bit worrying though. :(

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious Failures in regression runs

2015-03-30 Thread Justin Clift

On 30 Mar 2015, at 18:54, Vijay Bellur vbel...@redhat.com wrote:
 Hi All,
 
 We are attempting to capture all known spurious regression failures from the 
 jenkins instance in build.gluster.org at [1].
 The issues listed in the etherpad impede our patch merging workflow and need 
 to be sorted out before we branch
 release-3.7. If you happen to be the owner of one or more issues in the 
 etherpad, can you please look into the failures and
 have them addressed soon?

To help show up more regression failures, we ran 20x new VM's
in Rackspace with a full regression test each of master head
branch:

 * Two hung regression tests on tests/bugs/posix/bug-1113960.t
   * Still hung in case anyone wants to check them out
 * 162.242.167.96
 * 162.242.167.132
 * Both allowing remote root login, and using our jenkins
   slave password as their root pw

* 2 x failures on ./tests/basic/afr/sparse-file-self-heal.t
  Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  Added to etherpad

* 1 x failure on ./tests/bugs/disperse/bug-1187474.t
  Failed tests:  11-12

  Added to etherpad

* 1 x failure on ./tests/basic/uss.t
  Failed test:  153

  Already on etherpad

Looks like our general failure rate is improving. :)  The hangs
are a bit worrying though. :(

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failures in split-brain-healing.t

2015-03-10 Thread Ravishankar N



On 03/10/2015 06:55 PM, Emmanuel Dreyfus wrote:

3) later I hit this, I do not know yet if it is a consequence or not:
assertion list_empty (priv-table.lru[i]) failed: file quick-read.c, line 1052, 
function qr_inode_table_destroy
This happens in debug builds only, it should be fixed with 
http://review.gluster.org/#/c/9819/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-22 Thread Vijay Bellur


On 05/21/2014 08:50 PM, Vijaikumar M wrote:

KP, Atin and myself did some debugging and found that there was a
deadlock in glusterd.

When creating a volume snapshot, the back-end operation 'taking a
lvm_snapshot and starting brick' for the each brick
are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock
again.
This caused a deadlock in some race condition where main-thread waiting
for one of the synctask thread to finish and
synctask-thread waiting for the big_lock.


We are working on fixing this issue.



If this fix is going to take more time, can we please log a bug to track 
this problem and remove the test cases that need to be addressed from 
the test unit? This way other valid patches will not be blocked by the 
failure of the snapshot test unit.


We can introduce these tests again as part of the fix for the problem.

-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Pranith Kumar Karampuri



- Original Message -
 From: Atin Mukherjee amukh...@redhat.com
 To: gluster-devel@gluster.org, Pranith Kumar Karampuri pkara...@redhat.com
 Sent: Wednesday, May 21, 2014 3:39:21 PM
 Subject: Re: Fwd: Re: [Gluster-devel] Spurious failures because of nfs and 
 snapshots
 
 
 
 On 05/21/2014 11:42 AM, Atin Mukherjee wrote:
  
  
  On 05/21/2014 10:54 AM, SATHEESARAN wrote:
  Guys,
 
  This is the issue pointed out by Pranith with regard to Barrier.
  I was reading through it.
 
  But I wanted to bring it to concern
 
  -- S
 
 
   Original Message 
  Subject:   Re: [Gluster-devel] Spurious failures because of nfs and
  snapshots
  Date:  Tue, 20 May 2014 21:16:57 -0400 (EDT)
  From:  Pranith Kumar Karampuri pkara...@redhat.com
  To:Vijaikumar M vmall...@redhat.com, Joseph Fernandes
  josfe...@redhat.com
  CC:Gluster Devel gluster-devel@gluster.org
 
 
 
  Hey,
  Seems like even after this fix is merged, the regression tests are
  failing for the same script. You can check the logs at
  
  http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz
  Pranith,
  
  Is this the correct link? I don't see any log having this sequence there.
  Also looking at the log from this mail, this is expected as per the
  barrier functionality, an enable request followed by another enable
  should always fail and the same happens for disable.
  
  Can you please confirm the link and which particular regression test is
  causing this issue, is it bug-1090042.t?
  
  --Atin
 
  Relevant logs:
  [2014-05-20 20:17:07.026045]  : volume create patchy
  build.gluster.org:/d/backends/patchy1
  build.gluster.org:/d/backends/patchy2 : SUCCESS
  [2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
  [2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
  [2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED :
  Failed to reconfigure barrier.
  [2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
  [2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED :
  Failed to reconfigure barrier.
 
 This log is for bug-1092841.t and its expected.

Damn :-(. I think I screwed up the timestamps while checking Sorry about 
that :-(. But there are failures. Check 
http://build.gluster.org/job/regression/4501/consoleFull

Pranith

 
 --Atin
  Pranith
 
  - Original Message -
  From: Pranith Kumar Karampuri pkara...@redhat.com
  To: Gluster Devel gluster-devel@gluster.org
  Cc: Joseph Fernandes josfe...@redhat.com, Vijaikumar M
  vmall...@redhat.com
  Sent: Tuesday, May 20, 2014 3:41:11 PM
  Subject: Re: Spurious failures because of nfs and snapshots
 
  hi,
  Please resubmit the patches on top of
  http://review.gluster.com/#/c/7753
  to prevent frequent regression failures.
 
  Pranith
  - Original Message -
  From: Vijaikumar M vmall...@redhat.com
  To: Pranith Kumar Karampuri pkara...@redhat.com
  Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel
  gluster-devel@gluster.org
  Sent: Monday, May 19, 2014 2:40:47 PM
  Subject: Re: Spurious failures because of nfs and snapshots
 
  Brick disconnected with ping-time out:
 
  Here is the log message
  [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
  0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
  n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
  build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
  1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
  -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
  bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
  -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
  /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
  /build/install/var/log/glusterfs/br
  icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
  --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
  7b4b0 --brick-port 49164 --xlator-option
  3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
 2 [2014-05-19 04:29:38.141118] I
  [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
  ping-timeout to 30secs
 3 [2014-05-19 04:30:09.139521] C
  [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
  10.3.129.13:24007 has not responded in the last 30 seconds,
  disconnecting.
 
 
 
  Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
  ping-timer will be disabled by default for all the rpc connection except
  for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
 
 
  Thanks,
  Vijay
 
 
  On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
  The latest build failure also has the same issue:
  Download it from here:
  http://build.gluster.org:443/logs/glusterfs-logs

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Vijaikumar M

KP, Atin and myself did some debugging and found that there was a 
deadlock in glusterd.


When creating a volume snapshot, the back-end operation 'taking a 
lvm_snapshot and starting brick' for the each brick

are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock 
again.
This caused a deadlock in some race condition where main-thread waiting 
for one of the synctask thread to finish and

synctask-thread waiting for the big_lock.


We are working on fixing this issue.

Thanks,
Vijay


On Wednesday 21 May 2014 12:23 PM, Vijaikumar M wrote:
From the log: 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a17%3a10%3a51.tgzit 
looks like glusterd was hung:


*Glusterd log:**
* 5305 [2014-05-20 20:08:55.040665] E 
[glusterd-snapshot.c:3805:glusterd_add_brick_to_snap_volume] 
0-management: Unable to fetch snap device (vol1.brick_snapdevice0). 
Leaving empty
 5306 [2014-05-20 20:08:55.649146] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5307 [2014-05-20 20:08:55.663181] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5308 [2014-05-20 20:16:55.541197] W 
[glusterfsd.c:1182:cleanup_and_exit] (-- 0-: received signum (15), 
shutting down


Glusterd was hung when executing the testcase ./tests/bugs/bug-1090042.t.

*Cli log:**
*72649 [2014-05-20 20:12:51.960765] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72650 [2014-05-20 20:12:51.960850] T [socket.c:2689:socket_connect] 
(--/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(--/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
-/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72651 [2014-05-20 20:12:52.960943] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72652 [2014-05-20 20:12:52.960999] T [socket.c:2697:socket_connect] 
0-glusterfs: connecting 0x1e0fcc0, state=0 gen=0 sock=-1
 72653 [2014-05-20 20:12:52.961038] W [dict.c:1059:data_to_str] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0xf1) 
[0x7ff8ad9ec7d0]))) 0-dict: data is NULL
 72654 [2014-05-20 20:12:52.961070] W [dict.c:1059:data_to_str] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0x100) 
[0x7ff8ad9ec7df]))) 0-dict: data is NULL
 72655 [2014-05-20 20:12:52.961079] E 
[name.c:140:client_fill_address_family] 0-glusterfs: 
transport.address-family not specified. Could not guess default value 
from (remote-host:(null) or transport.unix.connect-path:(null)) 
optio   ns
 72656 [2014-05-20 20:12:54.961273] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72657 [2014-05-20 20:12:54.961404] T [socket.c:2689:socket_connect] 
(--/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(--/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
-/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72658 [2014-05-20 20:12:55.120645] D [cli-cmd.c:384:cli_cmd_submit] 
0-cli: Returning 110
 72659 [2014-05-20 20:12:55.120723] D 
[cli-rpc-ops.c:8716:gf_cli_snapshot] 0-cli: Returning 110



Now we need to find why glusterd was hung.


Thanks,
Vijay



On Wednesday 21 May 2014 06:46 AM, Pranith Kumar Karampuri wrote:

Hey,
 Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs 
athttp://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -

From: Pranith Kumar Karampuripkara...@redhat.com
To: Gluster Develgluster-devel@gluster.org
Cc: Joseph Fernandesjosfe...@redhat.com, Vijaikumar

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri

hi,
Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to 
prevent frequent regression failures.

Pranith
- Original Message -
 From: Vijaikumar M vmall...@redhat.com
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org
 Sent: Monday, May 19, 2014 2:40:47 PM
 Subject: Re: Spurious failures because of nfs and snapshots
 
 Brick disconnected with ping-time out:
 
 Here is the log message
 [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
 n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
 build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
 -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
 bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
 -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
 /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
 /build/install/var/log/glusterfs/br
 icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
 --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
 7b4b0 --brick-port 49164 --xlator-option
 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
2 [2014-05-19 04:29:38.141118] I
 [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
 ping-timeout to 30secs
3 [2014-05-19 04:30:09.139521] C
 [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
 
 
 
 Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
 ping-timer will be disabled by default for all the rpc connection except
 for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
 
 
 Thanks,
 Vijay
 
 
 On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
  The latest build failure also has the same issue:
  Download it from here:
  http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
 
  Pranith
 
  - Original Message -
  From: Vijaikumar M vmall...@redhat.com
  To: Joseph Fernandes josfe...@redhat.com
  Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel
  gluster-devel@gluster.org
  Sent: Monday, 19 May, 2014 11:41:28 AM
  Subject: Re: Spurious failures because of nfs and snapshots
 
  Hi Joseph,
 
  In the log mentioned below, it say ping-time is set to default value
  30sec.I think issue is different.
  Can you please point me to the logs where you where able to re-create
  the problem.
 
  Thanks,
  Vijay
 
 
 
  On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
  hi Vijai, Joseph,
In 2 of the last 3 build failures,
http://build.gluster.org/job/regression/4479/console,
http://build.gluster.org/job/regression/4478/console this
test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
better
to revert this test until the fix is available? Please send a patch
to revert the test case if you guys feel so. You can re-submit it
along with the fix to the bug mentioned by Joseph.
 
  Pranith.
 
  - Original Message -
  From: Joseph Fernandes josfe...@redhat.com
  To: Pranith Kumar Karampuri pkara...@redhat.com
  Cc: Gluster Devel gluster-devel@gluster.org
  Sent: Friday, 16 May, 2014 5:13:57 PM
  Subject: Re: Spurious failures because of nfs and snapshots
 
 
  Hi All,
 
  tests/bugs/bug-1090042.t :
 
  I was able to reproduce the issue i.e when this test is done in a loop
 
  for i in {1..135} ; do  ./bugs/bug-1090042.t
 
  When checked the logs
  [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
  0-management: setting frame-timeout to 600
  [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
  0-management: defaulting ping-timeout to 30secs
  [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
  0-management: setting frame-timeout to 600
  [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
  0-management: defaulting ping-timeout to 30secs
 
  The issue is with ping-timeout and is tracked under the bug
 
  https://bugzilla.redhat.com/show_bug.cgi?id=1096729
 
 
  The workaround is mentioned in
  https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8
 
 
  Regards,
  Joe
 
  - Original Message -
  From: Pranith Kumar Karampuri pkara...@redhat.com
  To: Gluster Devel gluster-devel@gluster.org
  Cc: Joseph Fernandes josfe...@redhat.com
  Sent: Friday, May 16, 2014 6:19:54 AM
  Subject: Spurious failures because of nfs and snapshots
 
  hi,
In the latest build I fired for review.gluster.com/7766
(http://build.gluster.org/job/regression/4443/console) failed

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-20 Thread Pranith Kumar Karampuri

Hey,
Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs at 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -
 From: Pranith Kumar Karampuri pkara...@redhat.com
 To: Gluster Devel gluster-devel@gluster.org
 Cc: Joseph Fernandes josfe...@redhat.com, Vijaikumar M 
 vmall...@redhat.com
 Sent: Tuesday, May 20, 2014 3:41:11 PM
 Subject: Re: Spurious failures because of nfs and snapshots
 
 hi,
 Please resubmit the patches on top of http://review.gluster.com/#/c/7753
 to prevent frequent regression failures.
 
 Pranith
 - Original Message -
  From: Vijaikumar M vmall...@redhat.com
  To: Pranith Kumar Karampuri pkara...@redhat.com
  Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel
  gluster-devel@gluster.org
  Sent: Monday, May 19, 2014 2:40:47 PM
  Subject: Re: Spurious failures because of nfs and snapshots
  
  Brick disconnected with ping-time out:
  
  Here is the log message
  [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main]
  0-/build/install/sbin/glusterfsd: Started running /build/install/sbi
  n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s
  build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9
  1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3
  -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f
  bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid
  -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name
  /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l
  /build/install/var/log/glusterfs/br
  icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log
  --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba
  7b4b0 --brick-port 49164 --xlator-option
  3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164)
 2 [2014-05-19 04:29:38.141118] I
  [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting
  ping-timeout to 30secs
 3 [2014-05-19 04:30:09.139521] C
  [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server
  10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting.
  
  
  
  Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where
  ping-timer will be disabled by default for all the rpc connection except
  for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
  
  
  Thanks,
  Vijay
  
  
  On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote:
   The latest build failure also has the same issue:
   Download it from here:
   http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz
  
   Pranith
  
   - Original Message -
   From: Vijaikumar M vmall...@redhat.com
   To: Joseph Fernandes josfe...@redhat.com
   Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel
   gluster-devel@gluster.org
   Sent: Monday, 19 May, 2014 11:41:28 AM
   Subject: Re: Spurious failures because of nfs and snapshots
  
   Hi Joseph,
  
   In the log mentioned below, it say ping-time is set to default value
   30sec.I think issue is different.
   Can you please point me to the logs where you where able to re-create
   the problem.
  
   Thanks,
   Vijay
  
  
  
   On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:
   hi Vijai, Joseph,
 In 2 of the last 3 build failures,
 http://build.gluster.org/job/regression/4479/console,
 http://build.gluster.org/job/regression/4478/console this
 test(tests/bugs/bug-1090042.t) failed. Do you guys think it is
 better
 to revert this test until the fix is available? Please send a
 patch
 to revert the test case if you guys feel so. You can re-submit it
 along with the fix to the bug mentioned by Joseph.
  
   Pranith.
  
   - Original Message -
   From: Joseph Fernandes josfe...@redhat.com
   To: Pranith Kumar Karampuri pkara...@redhat.com
   Cc: Gluster Devel gluster-devel@gluster.org
   Sent: Friday, 16 May, 2014 5:13:57 PM
   Subject: Re: Spurious failures because of nfs and snapshots
  
  
   Hi All,
  
   tests/bugs/bug-1090042.t :
  
   I was able to reproduce the issue i.e when this test is done in a loop
  
   for i in

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-19 Thread Vijaikumar M


Hi Joseph,

In the log mentioned below, it say ping-time is set to default value 
30sec.I think issue is different.
Can you please point me to the logs where you where able to re-create 
the problem.


Thanks,
Vijay



On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:

hi Vijai, Joseph,
 In 2 of the last 3 build failures, 
http://build.gluster.org/job/regression/4479/console, 
http://build.gluster.org/job/regression/4478/console this 
test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert 
this test until the fix is available? Please send a patch to revert the test 
case if you guys feel so. You can re-submit it along with the fix to the bug 
mentioned by Joseph.

Pranith.

- Original Message -

From: Joseph Fernandes josfe...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Friday, 16 May, 2014 5:13:57 PM
Subject: Re: Spurious failures because of nfs and snapshots


Hi All,

tests/bugs/bug-1090042.t :

I was able to reproduce the issue i.e when this test is done in a loop

for i in {1..135} ; do  ./bugs/bug-1090042.t

When checked the logs
[2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs
[2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs

The issue is with ping-timeout and is tracked under the bug

https://bugzilla.redhat.com/show_bug.cgi?id=1096729


The workaround is mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8


Regards,
Joe

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Cc: Joseph Fernandes josfe...@redhat.com
Sent: Friday, May 16, 2014 6:19:54 AM
Subject: Spurious failures because of nfs and snapshots

hi,
 In the latest build I fired for review.gluster.com/7766
 (http://build.gluster.org/job/regression/4443/console) failed because of
 spurious failure. The script doesn't wait for nfs export to be
 available. I fixed that, but interestingly I found quite a few scripts
 with same problem. Some of the scripts are relying on 'sleep 5' which
 also could lead to spurious failures if the export is not available in 5
 seconds. We found that waiting for 20 seconds is better, but 'sleep 20'
 would unnecessarily delay the build execution. So if you guys are going
 to write any scripts which has to do nfs mounts, please do it the
 following way:

EXPECT_WITHIN 20 1 is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script
tests/bugs/bug-1090042.t on the next build fired by Niels.
Joesph (CCed) is debugging it. He agreed to reply what he finds and share it
with us so that we won't introduce similar bugs in future.

I encourage you guys to share what you fix to prevent spurious failures in
future.

Thanks
Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-18 Thread Pranith Kumar Karampuri

- Original Message -
 From: Justin Clift jus...@gluster.org
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Monday, 19 May, 2014 10:26:04 AM
 Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots

 On 16/05/2014, at 1:49 AM, Pranith Kumar Karampuri wrote:
  hi,
 In the latest build I fired for review.gluster.com/7766
 (http://build.gluster.org/job/regression/4443/console) failed because
 of spurious failure. The script doesn't wait for nfs export to be
 available. I fixed that, but interestingly I found quite a few scripts
 with same problem. Some of the scripts are relying on 'sleep 5' which
 also could lead to spurious failures if the export is not available in
 5 seconds.

 Cool.  Fixing this NFS problem across all of the tests would be really
 welcome.  That specific failed test (bug-1087198.t) is the most common
 one I've seen over the last few weeks, causing about half of all
 failures in master.

 Eliminating this class of regression failure would be really helpful. :)

This particular class is eliminated :-). Patch was merged on Friday.

Pranith

 + Justin

 --
 Open Source and Standards @ Red Hat

 twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-15 Thread Anand Avati

On Thu, May 15, 2014 at 5:49 PM, Pranith Kumar Karampuri 
pkara...@redhat.com wrote:

 hi,
 In the latest build I fired for review.gluster.com/7766 (
 http://build.gluster.org/job/regression/4443/console) failed because of
 spurious failure. The script doesn't wait for nfs export to be available. I
 fixed that, but interestingly I found quite a few scripts with same
 problem. Some of the scripts are relying on 'sleep 5' which also could lead
 to spurious failures if the export is not available in 5 seconds. We found
 that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily
 delay the build execution. So if you guys are going to write any scripts
 which has to do nfs mounts, please do it the following way:

 EXPECT_WITHIN 20 1 is_nfs_export_available;
 TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;


Always please also add mount -o soft,intr in the regression scripts for
mounting nfs. Becomes so much easier to cleanup any hung mess. We
probably need an NFS mounting helper function which can be called like:

TEST mount_nfs $H0:/$V0 $N0;

Thanks

Avati
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

59 matches

Mail list logo