Re: [Gluster-Maintainers] Flaky Regression tests again?

2020-08-17 Thread Sanju Rakonde
I think we should look for the root cause of these failures. If we mark the
tests as Bad, the tests might go left behind. If someone is ready to own
the tests and keep track of the on-going efforts of root causing them, it
makes sense to mark them as Bad.

One more thought I have is, let's have a deadline discussed and fixed in an
upcoming community meeting. In the meeting let's own the failures and fix
them by the deadline. (If everyone agrees!)

On Mon, Aug 17, 2020 at 12:58 PM Deepshikha Khandelwal 
wrote:

>
>
>
> On Sat, Aug 15, 2020 at 7:33 PM Amar Tumballi  wrote:
>
>> If I look at the recent regression runs (
>> https://build.gluster.org/job/centos7-regression/), there is more than
>> 50% failure in tests.
>>
>> At least 90% of the failures are not due to the patch itself. Considering
>> regression tests are very critical for our patches to get merged, and takes
>> almost 6-7 hours now a days to complete, how can we make sure we are
>> passing regression with 100% certainty ?
>>
>> Again, out of this, there are only a few tests which keep failing, should
>> we revisit the tests and see why it is failing? or Should we mark them as
>> 'Good if it passes, but don't fail regression if the tests fail' condition?
>>
>> I think we should revisit these tests for the root cause.
>
>> Some tests I have listed here from recent failures:
>>
>> tests/bugs/core/multiplex-limit-issue-151.t
>> tests/bugs/distribute/bug-1122443.t +++
>> tests/bugs/distribute/bug-1117851.t
>> tests/bugs/glusterd/bug-857330/normal.t +
>> tests/basic/mount-nfs-auth.t +
>>
> It failed mainly on builder202. I disconnected the builder and will check
> what is going wrong. Though I don't have any full proof analysis on this
> one as it has been always flaky(failing quite randomly)
>
>>
>> tests/basic/changelog/changelog-snapshot.t
>> tests/basic/afr/split-brain-favorite-child-policy.t
>> tests/basic/distribute/rebal-all-nodes-migrate.t
>> tests/bugs/glusterd/quorum-value-check.t
>> tests/features/lock-migration/lkmigration-set-option.t
>> tests/bugs/nfs/bug-1116503.t
>> tests/basic/ec/ec-quorum-count-partial-failure.t
>>
>> Considering these are just 12 of 750+ tests we run, Should we even
>> consider marking them bad till they are fixed to be 100% consistent?
>>
> Makes sense.
>
>>
>> Any thoughts on how we should go ahead?
>>
>> Regards,
>> Amar
>>
>> (+) indicates a count, so more + you see against the file, more times
>> that failed.
>>
>> ___
>> maintainers mailing list
>> maintainers@gluster.org
>> https://lists.gluster.org/mailman/listinfo/maintainers
>>
> ___
> maintainers mailing list
> maintainers@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>


-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Build failed in Jenkins: regression-test-with-multiplex #1611

2020-01-16 Thread Sanju Rakonde
The below glusterd test cases are constantly failing in brick-mux
regression:
./tests/bugs/glusterd/bug-857330/normal.t
./tests/bugs/glusterd/bug-857330/xml.t
./tests/bugs/glusterd/quorum-validation.t

./tests/bugs/glusterd/bug-857330/normal.t and
./tests/bugs/glusterd/bug-857330/xml.t are timed-out after 200 seconds. I
don't find any abnormality in the logs. we need to increase the time(?).
I'm unable to run these tests in my local setup as they are always failing
saying "ModuleNotFoundError: No module named 'xattr' " Is the same
happening in CI as well?

Also, we don't print the output of "prove -vf  " when the test gets
timed-out. It will be great if we print the output. It will help us to
debug and to check which step took more time.

./tests/bugs/glusterd/quorum-validation.t is failing because of the
regression caused by https://review.gluster.org/#/c/glusterfs/+/21651/.
Rafi is looking into this issue. To explain the issue in brief, "after a
reboot, glusterd is spawning multiple brick processes for a single brick
instance and volume status shows the brick as offline".

I have gone through the logs and I see that glusterd is sending a detach
request for brick2 once the brick2 is attached successfully to the existing
brick process of brick1 and spawning a new brick process for it. I also
cross-checked it by looking at the "ps -ax | grep glusterfsd" output.


Status of volume: patchy
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick 127.1.1.1:/d/backends/1/patchy0   49153 0  Y
28492
Brick 127.1.1.2:/d/backends/2/patchy1   N/A   N/AN
N/A
Brick 127.1.1.2:/d/backends/2/patchy2   N/A   N/AN
N/A

Task Status of Volume patchy
--
There are no active volume tasks

28492 ?Ssl0:00 /usr/local/sbin/glusterfsd -s 127.1.1.1
--volfile-id patchy.127.1.1.1.d-backends-1-patchy0 -p
/d/backends/1/run/gluster/vols/patchy/127.1.1.1-d-backends-1-patchy0.pid -S
/var/run/gluster/f96de9a0bebaefd6.socket --brick-name /d/backends/1/patchy0
-l /var/log/glusterfs/1/bricks/d-backends-1-patchy0.log --xlator-option
*-posix.glusterd-uuid=32807d18-ac26-44b1-8cf4-9cf072b13f35 --process-name
brick --brick-port 49153 --xlator-option patchy-server.listen-port=49153
--xlator-option transport.socket.bind-address=127.1.1.1 --brick-mux
28529 ?Ssl0:00 /usr/local/sbin/glusterfsd -s 127.1.1.2
--volfile-id patchy.127.1.1.2.d-backends-2-patchy1 -p
/d/backends/2/run/gluster/vols/patchy/127.1.1.2-d-backends-2-patchy1.pid -S
/var/run/gluster/0df89e2f99a00139.socket --brick-name /d/backends/2/patchy1
-l /var/log/glusterfs/2/bricks/d-backends-2-patchy1.log --xlator-option
*-posix.glusterd-uuid=4e538359-d7fb-437a-a222-0047ff323976 --process-name
brick --brick-port 49154 --xlator-option patchy-server.listen-port=49154
--xlator-option transport.socket.bind-address=127.1.1.2 --brick-mux
28547 ?Ssl0:00 /usr/local/sbin/glusterfsd -s 127.1.1.2
--volfile-id patchy.127.1.1.2.d-backends-2-patchy2 -p
/d/backends/2/run/gluster/vols/patchy/127.1.1.2-d-backends-2-patchy2.pid -S
/var/run/gluster/a4b51a62080fd909.socket --brick-name /d/backends/2/patchy2
-l /var/log/glusterfs/2/bricks/d-backends-2-patchy2.log --xlator-option
*-posix.glusterd-uuid=4e538359-d7fb-437a-a222-0047ff323976 --process-name
brick --brick-port 49155 --xlator-option patchy-server.listen-port=49155
--xlator-option transport.socket.bind-address=127.1.1.2 --brick-mux
28558 ?Ssl0:00 /usr/local/sbin/glusterfsd -s 127.1.1.2
--volfile-id patchy.127.1.1.2.d-backends-2-patchy2 -p
/d/backends/2/run/gluster/vols/patchy/127.1.1.2-d-backends-2-patchy2.pid -S
/var/run/gluster/a4b51a62080fd909.socket --brick-name /d/backends/2/patchy2
-l /var/log/glusterfs/2/bricks/d-backends-2-patchy2.log --xlator-option
*-posix.glusterd-uuid=4e538359-d7fb-437a-a222-0047ff323976 --process-name
brick --brick-port 49156 --xlator-option patchy-server.listen-port=49156
--xlator-option transport.socket.bind-address=127.1.1.2 --brick-mux
28605 pts/6S+ 0:00 grep glusterfsd


On Wed, Jan 15, 2020 at 2:40 AM  wrote:

> See <
> https://build.gluster.org/job/regression-test-with-multiplex/1611/display/redirect?page=changes
> >
>
> Changes:
>
> [Xavi Hernandez] xlators/storage: remove duplicated includes
>
> [Amar Tumballi] xlator/bit-rot-stub-helpers: structure logging
>
> [Shwetha K Acharya] tools/glusterfind: Remove an extra argument
>
>
> --
> [...truncated 3.51 MB...]
> ./tests/bugs/upcall/bug-1458127.t  -  14 second
> ./tests/bugs/snapshot/bug-1250387.t  -  14 second
> ./tests/bugs/snapshot/bug-1202436-calculate-quota-cksum-during-snap-restore.t
> -  14 second
> ./tests/bugs/snapshot/bug-1064768.t  -  14 second
> ./tests/bugs/shard/bug-1272986.t  -  14 second
> ./tests/bugs/replicate/bug-976800.t  -  14 

Re: [Gluster-Maintainers] [gluster-packaging] glusterfs-5.4 released

2019-03-06 Thread Sanju Rakonde
On Tue, Mar 5, 2019 at 9:46 PM Shyam Ranganathan 
wrote:

> On 3/5/19 10:10 AM, Sanju Rakonde wrote:
> >
> >
> > On Tue, Mar 5, 2019 at 7:29 PM Shyam Ranganathan  > <mailto:srang...@redhat.com>> wrote:
> >
> > On 2/27/19 5:19 AM, Niels de Vos wrote:
> > > On Tue, Feb 26, 2019 at 02:47:30PM +,
> > jenk...@build.gluster.org <mailto:jenk...@build.gluster.org> wrote:
> > >> SRC:
> >
> https://build.gluster.org/job/release-new/80/artifact/glusterfs-5.4.tar.gz
> > >> HASH:
> >
> https://build.gluster.org/job/release-new/80/artifact/glusterfs-5.4.sha512sum
> > >
> > > Packages for the CentOS Storage SIG are now available for testing.
> > > Please try them out and report test results on this list.
> > >
> > >   # yum install centos-release-gluster
> > >   # yum install --enablerepo=centos-gluster5-test glusterfs-server
> >
> > Due to patch [1] upgrades are broken, so we are awaiting a fix or
> revert
> > of the same before requesting a new build of 5.4.
> >
> > The current RPMs should hence not be published.
> >
> > Sanju/Hari, are we reverting this patch so that we can release 5.4,
> or
> > are we expecting the fix to land in 5.4 (as in [2])?
> >
> >
> > Shyam, I need some more time(approximately 1 day) to provide the fix. If
> > we have 1 more day with us, we can wait. Or else we can revert the
> > patch[1] and continue with the release.
>
> We can wait a day, let me know tomorrow regarding the status. Thanks.
>

Shyam, the fix got some reviews. We are waiting for other reviewers to take
a look at it. If there are no other review comments, the patch
will  get merge tomorrow morning IST.

>
> >
> >
> > Thanks,
> > Shyam
> >
> > [1] Patch causing regression:
> > https://review.gluster.org/c/glusterfs/+/22148
> >
> > [2] Proposed fix on master:
> > https://review.gluster.org/c/glusterfs/+/22297/
> >
> >
> >
> > --
> > Thanks,
> > Sanju
>


-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] [gluster-packaging] glusterfs-5.4 released

2019-03-05 Thread Sanju Rakonde
On Tue, Mar 5, 2019 at 7:29 PM Shyam Ranganathan 
wrote:

> On 2/27/19 5:19 AM, Niels de Vos wrote:
> > On Tue, Feb 26, 2019 at 02:47:30PM +, jenk...@build.gluster.org
> wrote:
> >> SRC:
> https://build.gluster.org/job/release-new/80/artifact/glusterfs-5.4.tar.gz
> >> HASH:
> https://build.gluster.org/job/release-new/80/artifact/glusterfs-5.4.sha512sum
> >
> > Packages for the CentOS Storage SIG are now available for testing.
> > Please try them out and report test results on this list.
> >
> >   # yum install centos-release-gluster
> >   # yum install --enablerepo=centos-gluster5-test glusterfs-server
>
> Due to patch [1] upgrades are broken, so we are awaiting a fix or revert
> of the same before requesting a new build of 5.4.
>
> The current RPMs should hence not be published.
>
> Sanju/Hari, are we reverting this patch so that we can release 5.4, or
> are we expecting the fix to land in 5.4 (as in [2])?
>

Shyam, I need some more time(approximately 1 day) to provide the fix. If we
have 1 more day with us, we can wait. Or else we can revert the patch[1]
and continue with the release.

>
> Thanks,
> Shyam
>
> [1] Patch causing regression:
> https://review.gluster.org/c/glusterfs/+/22148
>
> [2] Proposed fix on master:
> https://review.gluster.org/c/glusterfs/+/22297/
>


-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] Various upgrades are Broken

2019-03-04 Thread Sanju Rakonde
On Mon, Mar 4, 2019 at 6:54 PM Shyam Ranganathan 
wrote:

> On 3/4/19 8:09 AM, Hari Gowtham wrote:
> > On Mon, Mar 4, 2019 at 6:18 PM Shyam Ranganathan 
> wrote:
> >>
> >> On 3/4/19 7:29 AM, Amar Tumballi Suryanarayan wrote:
> >>> Thanks for testing this Hari.
> >>>
> >>> On Mon, Mar 4, 2019 at 5:42 PM Hari Gowtham  >>> > wrote:
> >>>
> >>> Hi,
> >>>
> >>> With the patch https://review.gluster.org/#/c/glusterfs/+/21838/
> the
> >>> upgrade from 3.12 to 6, 4.1 to 6 and 5 to 6 is broken.
> >>>
> >>> The above patch is available in release 6 and has been back-ported
> >>> to 4.1 and 5.
> >>> Though there isn't any release made with this patch on 4.1 and 5,
> if
> >>> made there are a number of scenarios that will fail. Few are
> mentioned
> >>> below:
> >>>
> >>>
> >>> Considering there is no release with this patch in, lets not consider
> >>> backporting at all.
> >
> > It has been back-ported to 4 and 5 already.
> > Regarding 5 we have decided to revert and make the release.
> > Are we going to revert the patch for 4 or wait for the fix?
>
> Release-4.1 next minor release is slated for week of 20th March, 2019.
> Hence, we have time to get the fix in place, but before that I would
> revert it anyway, so that tracking need not bother with possible late
> arrival of the fix.
>
> >
> >>
> >> Current 5.4 release (yet to be announced and released on the CentOS SIG
> >> (as testing is pending) *has* the fix. We need to revert it and rebuild
> >> 5.4, so that we can make the 5.4 release (without the fix).
> >>
> >> Hari/Sanju are you folks already on it?
> >
> > Yes, Sanju is working on the patch.
>

Fix is posted for review https://review.gluster.org/#/c/glusterfs/+/22297/

>
> Thank you!
>
> >
> >>
> >> Shyam
> >
> >
>


-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] [Gluster-devel] Release 5: Branched and further dates

2018-09-28 Thread Sanju Rakonde
On Wed, Sep 26, 2018 at 7:53 PM Shyam Ranganathan 
wrote:

> Hi,
>
> Updates on the release and shout out for help is as follows,
>
> RC0 Release packages for testing are available see the thread at [1]
>
> These are the following activities that we need to complete for calling
> the release as GA (with no major regressions i.e):
>
> 1. Release notes (Owner: release owner (myself), will send out an
> initial version for review and to solicit inputs today)
>
> 2. Testing dashboard to maintain release health (new, thanks Nigel)
>   - Dashboard at [2]
>   - We already have 3 failures here as follows, needs attention from
> appropriate *maintainers*,
> (a)
>
> https://build.gluster.org/job/regression-test-with-multiplex/871/consoleText
> - Failed with core:
> ./tests/basic/afr/gfid-mismatch-resolution-with-cli.t
> (b)
>
> https://build.gluster.org/job/regression-test-with-multiplex/873/consoleText
> - Failed with core: ./tests/bugs/snapshot/bug-1275616.t
> - Also test ./tests/bugs/glusterd/validating-server-quorum.t had
> to be
> retried
>

The test case ./tests/bugs/glusterd/validating-server-quorum.t had to be
retried since, it got timed out at the first run.
I went through the logs of first run, everything looks fine. Looking at all
the time stamps, got to know that cluster_brick_up_status took 45sec
(PROCESS_UP_TIMEOUT) most of the times when it is used. As we clubbed many
of the glusterd test cases into a single test case, the test case might
need some more time to execute. If this test case gets timed out
repeatedly, we will think of the actions need to be taken.

Definition of cluster_brick_up_status for your reference:
function cluster_brick_up_status {
local vol=$2
local host=$3
local brick=$4
eval \$CLI_$1 volume status $vol $host:$brick --xml | sed -ne
's/.*\([01]\)<\/status>/\1/p'
}

(c)
> https://build.gluster.org/job/regression-test-burn-in/4109/consoleText
> - Failed with core: ./tests/basic/mgmt_v3-locks.t
>
> 3. Upgrade testing
>   - Need *volunteers* to do the upgrade testing as stated in the 4.1
> upgrade guide [3] to note any differences or changes to the same
>   - Explicit call out on *disperse* volumes, as we continue to state
> online upgrade is not possible, is this addressed and can this be tested
> and the documentation improved around the same?
>
> 4. Performance testing/benchmarking
>   - I would be using smallfile and FIO to baseline 3.12 and 4.1 and test
> RC0 for any major regressions
>   - If we already know of any please shout out so that we are aware of
> the problems and upcoming fixes to the same
>
> 5. Major testing areas
>   - Py3 support: Need *volunteers* here to test out the Py3 support
> around changed python files, if there is not enough coverage in the
> regression test suite for the same
>
> Thanks,
> Shyam
>
> [1] Packages for RC0:
> https://lists.gluster.org/pipermail/maintainers/2018-September/005044.html
>
> [2] Release testing health dashboard:
> https://build.gluster.org/job/nightly-release-5/
>
> [3] 4.1 upgrade guide:
> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
>
> On 09/13/2018 11:10 AM, Shyam Ranganathan wrote:
> > Hi,
> >
> > Release 5 has been branched today. To backport fixes to the upcoming 5.0
> > release use the tracker bug [1].
> >
> > We intend to roll out RC0 build by end of tomorrow for testing, unless
> > the set of usual cleanup patches (op-version, some messages, gfapi
> > version) land in any form of trouble.
> >
> > RC1 would be around 24th of Sep. with final release tagging around 1st
> > of Oct.
> >
> > I would like to encourage everyone to test out the bits as appropriate
> > and post updates to this thread.
> >
> > Thanks,
> > Shyam
> >
> > [1] 5.0 tracker:
> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-5.0
> > ___
> > maintainers mailing list
> > maintainers@gluster.org
> > https://lists.gluster.org/mailman/listinfo/maintainers
> >
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>


-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers


Re: [Gluster-Maintainers] [Gluster-devel] Failing multiplex regressions!

2018-07-25 Thread Sanju Rakonde
Hi Shyam,

I need to work on this, but couldn't spend much time till now. I will try
to spend as much time as I can and get these fixed.
Mohit is also working on this AFAIK.

Thanks,
Sanju

On Wed, Jul 25, 2018 at 12:27 AM, Shyam Ranganathan 
wrote:

> Hi,
>
> Multiplex regression jobs are failing everyday, see [1].
>
> May I know is anyone is looking into this?
>
> It was Mohit the last time around, are you still working on this Mohit?
> What patches are in progress to address this, if you are on it?
>
> Thanks,
> Shyam
>
> [1] regression-test-with-multiplex -
> https://build.gluster.org/job/regression-test-with-multiplex/changes
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Thanks,
Sanju
___
maintainers mailing list
maintainers@gluster.org
https://lists.gluster.org/mailman/listinfo/maintainers