[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 --- Comment #6 from Yaniv Kaul --- Pushed - seems to be OK - https://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:remove_strncpy2 -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=CSwEcwOxql&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 Nigel Babu changed: What|Removed |Added Status|NEW |VERIFIED CC||nig...@redhat.com --- Comment #5 from Nigel Babu --- >From gerrit's sshd_log: [2018-08-22 19:07:52,728 +] 78847570 mykaul a/1001977 LOGIN FROM 127.0.0.1 [2018-08-22 19:08:07,718 +] 78847570 mykaul a/1001977 git-upload-pack./glusterfs 2ms 14372ms 0 [2018-08-22 19:08:08,088 +] 78847570 mykaul a/1001977 LOGOUT [2018-08-22 19:08:15,459 +] 1873f98b mykaul a/1001977 LOGIN FROM 127.0.0.1 [2018-08-22 19:08:21,644 +] 1873f98b mykaul a/1001977 git-receive-pack./glusterfs 2ms 5568ms 0 git/2.17.1 [2018-08-22 19:08:21,845 +] 1873f98b mykaul a/1001977 LOGOUT >From /var/log/messages Aug 22 19:09:05 gerrit-new kernel: git invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 . . . . Aug 22 19:09:07 gerrit-new kernel: Out of memory: Kill process 11061 (java) score 388 or sacrifice child Aug 22 19:09:07 gerrit-new kernel: Killed process 11061 (java) total-vm:3814660kB, anon-rss:1503912kB, file-rss:0kB, shmem-rss:0kB It looks like pushing so many patches in one go triggered Gerrit and git to consume large amounts of memory. This lead to Gerrit being OOM killed. It looks like we don't have an swap space on this box. I've just added 1G of swap to reduce a chance this happens next time. Yaniv, can you try pushing again? -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=1sqfvoOYtu&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 --- Comment #4 from Yaniv Kaul --- (In reply to M. Scherer from comment #3) > So Nigel did restart Gerrit, and this seems to be working now. Any RCA? My commits are not there. Should I re-submit? Now? -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=SFr46lXz3N&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620377] Coverity scan setup for gluster-block and related projects
https://bugzilla.redhat.com/show_bug.cgi?id=1620377 Bhumika Goyal changed: What|Removed |Added Summary|Coverity scan setup for |Coverity scan setup for |gluster-block and realated |gluster-block and related |projects|projects -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=CEm84uQlD5&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620377] New: Coverity scan setup for gluster-block and realated projects
https://bugzilla.redhat.com/show_bug.cgi?id=1620377 Bug ID: 1620377 Summary: Coverity scan setup for gluster-block and realated projects Product: GlusterFS Version: 3.12 Component: project-infrastructure Assignee: b...@gluster.org Reporter: bgo...@redhat.com CC: b...@gluster.org, gluster-infra@gluster.org Description of problem: Setting up Coverity scans for gluster-block and tcmu-runner. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=TZnHkRbtGF&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
Re: [Gluster-infra] Reboot policy for the infra
One more piece that's missing is when we'll restart the physical servers. That seems to be entirely missing. The rest looks good to me and I'm happy to add an item to next sprint to automate the node rebooting. On Tue, Aug 21, 2018 at 9:56 PM Michael Scherer wrote: > Hi, > > so that's kernel reboot time again, this time courtesy of Intel > (again). I do not consider the issue to be "OMG the sky is falling", > but enough to take time to streamline our process to reboot. > > > > Currently, we do not have a policy or anything, and I think the > negociation time around that is cumbersome: > - we need to reach people, which take time and add latency (would be > bad if that was a urgent issue, and likely add undeed stress while > waiting) > > - we need to keep track of what was supposed to be done, which is also > cumbersome > > While that's not a problem if I had only gluster to deal with, my team > of 3 do have to deal with a few more projects than 1, and orchestrating > choice for a dozen of group is time consuming (just think last time you > had to go to a restaurant after a conference to see how hard it is to > reach agreements). > > So I would propose that we simplify that with the following policy: > > - Jenkins builder would be reboot by jenkins on a regular basis. > I do not know how we can do that, but given that we have enough node to > sustain builds, it shouldn't impact developpers in a big way. The only > exception is the freebsd builder, since we only have 1 functionnal at > the moment. But once the 2nd is working, it should be treated like the > others. > > - service in HA (firewall, reverse proxy, internal squid/DNS) would be > reboot during the day without notice. Due to working HA, that's non > user impacting. In fact, that's already what I do. > > - service not in HA should be pushed for HA (gerrit might get there one > day, no way for jenkins :/, need to see for postgres and so > fstat/softserve, and maybe try to get something for > download.gluster.org) > > - service critical and not in HA should be announced in advance. > Critical mean the service listed here: https://gluster-infra-docs.readt > hedocs.io/emergency.html > > - service non visible to end user (backup servers, ansible deployment > etc) can be reboot at will > > Then the only question is what about stuff not in the previous > category, like softserve, fstat. > > Also, all dependencies are as critical as the most critical service > that depend on them. So hypervisors hosting gerrit/jenkins are critical > (until we find a way to avoid outage), the ones for builders are not. > > > > Thoughts, ideas ? > > > -- > Michael Scherer > Sysadmin, Community Infrastructure and Platform, OSAS > > ___ > Gluster-infra mailing list > Gluster-infra@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-infra -- nigelb ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620358] smoke job failures
https://bugzilla.redhat.com/show_bug.cgi?id=1620358 Nigel Babu changed: What|Removed |Added CC||nig...@redhat.com --- Comment #1 from Nigel Babu --- Gah, we ran out of space again. I've deleted all archived files older than a 15 days across the board. If you did not download logs for a Jenkins job, they're all gone now. Going to look at moving the RPM build artifacts to the http server rather than storing them on Jenkins. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=uupDxRQd6V&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620358] New: smoke job failures
https://bugzilla.redhat.com/show_bug.cgi?id=1620358 Bug ID: 1620358 Summary: smoke job failures Product: GlusterFS Version: mainline Component: project-infrastructure Assignee: b...@gluster.org Reporter: atumb...@redhat.com CC: b...@gluster.org, gluster-infra@gluster.org Description of problem: Noticed that few of the smoke jobs failed today morning. Looks like no space left on device: 04:40:44 ERROR: Step ‘Archive the artifacts’ aborted due to exception: 04:40:44 java.nio.file.FileSystemException: /var/lib/jenkins/jobs/strfmt_errors/builds/12949/archive: No space left on device (from https://build.gluster.org/job/strfmt_errors/12949/console) https://build.gluster.org/job/strfmt_errors/12949/ : FAILURE https://build.gluster.org/job/devrpm-el6/11084/ : FAILURE https://build.gluster.org/job/devrpm-el7/11106/ : FAILURE https://build.gluster.org/job/fedora-smoke/1026/ : FAILURE (skipped) https://build.gluster.org/job/devrpm-fedora/11136/ : FAILURE https://build.gluster.org/job/gd2-smoke/203/ : FAILURE (skipped) -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=0HIC8MbMob&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 --- Comment #3 from M. Scherer --- So Nigel did restart Gerrit, and this seems to be working now. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=S9IORLQJ94&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 M. Scherer changed: What|Removed |Added CC||msche...@redhat.com --- Comment #2 from M. Scherer --- Looking at it. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=qbxBT6uAfI&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 Vijay Bellur changed: What|Removed |Added CC||vbel...@redhat.com --- Comment #1 from Vijay Bellur --- Encountering the same problem: Service Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=JtWcv4Ht98&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1620243] New: Gerrit is non-responsive (503)
https://bugzilla.redhat.com/show_bug.cgi?id=1620243 Bug ID: 1620243 Summary: Gerrit is non-responsive (503) Product: GlusterFS Version: mainline Component: project-infrastructure Severity: urgent Assignee: b...@gluster.org Reporter: yk...@redhat.com CC: b...@gluster.org, gluster-infra@gluster.org Description of problem: I'm getting 503 Service Unavailable from it. Might have been after I've uploaded a rather large patchset: remote: New Changes: remote: https://review.gluster.org/#/c/glusterfs/+/20894 glfs-fops.c, glfs.c: strncpy() -> sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20895 {cli-cmd-parser|cli-rpc-ops||cli-xml-output}.c: strncpy()->sprintf(), reduce ... remote: https://review.gluster.org/#/c/glusterfs/+/20896 {mount-common|fusermount|mount_darwin|umountd}.c: strncpy()->sprintf(), ... remote: https://review.gluster.org/#/c/glusterfs/+/20897 extras/geo-rep/gsync-sync-gfid.c: move from strlen() to sizeof() remote: https://review.gluster.org/#/c/glusterfs/+/20898 multiple files: move from strlen() to sizeof() remote: https://review.gluster.org/#/c/glusterfs/+/20899 multiple files: move from strlen() to sizeof() remote: https://review.gluster.org/#/c/glusterfs/+/20900 bit-rot xlator: strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20901 changelog xlator: strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20902 changetimerecoder xlator: strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20903 xlators: move from strlen() to sizeof() remote: https://review.gluster.org/#/c/glusterfs/+/20904 NFS server (mount3.c, nfs-inodes.c): strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20905 multiple xlators: move from strlen() to sizeof() remote: https://review.gluster.org/#/c/glusterfs/+/20906 multiple xlators: strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20907 multiple xlators (mgmt): strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20908 multiple xlators (storage/posix): strncpy()->sprintf(), reduce strlen()'s remote: https://review.gluster.org/#/c/glusterfs/+/20909 Various files: strncpy()->sprintf(), reduce strlen()'s remote: remote: Pushing to refs/publish/* is deprecated, use refs/for/* instead. To ssh://review.gluster.org/glusterfs * [new branch] HEAD -> refs/publish/master/remove_strncpy2 -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=5dWEvv2kEx&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #13 from M. Scherer --- So I pushed a fix, should deploy (unless the wifi break in my train). So the good news is that we can claim that we did made so more productivity progress that we did hit the limit, so that's positive :p -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=EC6rGUt8AY&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #12 from M. Scherer --- Mhhh: août 21 20:39:36 gerrit-new.rht.gluster.org xinetd[16437]: FAIL: git per_source_limit from=:::8.43.85.181 Suspect that it might be the cause. 8.43.85.181 is the firewall ip. Several solution: - have a way to use internal IP - add a exception for that IP. The 2nd is easier, the 1st is cleaner. I will start by the 2nd. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=w8HjITSKbt&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #11 from M. Scherer --- So, Nigel pointed out that git is done by xinetd, and the log show nothing except some ipv6 errors. While it might be related, i think it is not, especially since taht's only for the rackspace builder, not the internal one. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=TCuQ1jRz56&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #10 from M. Scherer --- I did and didn't found anything that did seemed relevant. I may have missed something however, and Nigel is also looking. That's a transient issue, so not easy to diagnose. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=f98p7lNGu6&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #9 from Yaniv Kaul --- (In reply to Yaniv Kaul from comment #7) > (In reply to M. Scherer from comment #4) > > So not wanting to say network is perfect, but this started since the upgrade > > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > > much client or something like this ? > > I wonder if it happens when I 'flood' it with multiple patches (all in the > same topic). Can we look at Gerrit logs? -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=GFdMv9WTMb&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #8 from M. Scherer --- yeah, I wanted to explore that road too by doing *cough* load testing of the git server on the staging env, but no git port (or I am not awake enough). -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=JZgg9bVDjF&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #7 from Yaniv Kaul --- (In reply to M. Scherer from comment #4) > So not wanting to say network is perfect, but this started since the upgrade > to new gerrit, no ? Could it be some issue where gerrit drop if there is too > much client or something like this ? I wonder if it happens when I 'flood' it with multiple patches (all in the same topic). -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=wN2YIAjVyX&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #6 from M. Scherer --- And it happen since a long time too, so unlikely to be the upgrade, if the error in the log is the same as the one reported. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=L9WjvQGocx&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 --- Comment #5 from M. Scherer --- [root@gerrit-new logs]# grep 'reset by peer' error_log |wc -l 18 Seems it happen quite often :/ -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=gbdwQqyDlz&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1619838] Jenkins connection issues failing tests
https://bugzilla.redhat.com/show_bug.cgi?id=1619838 M. Scherer changed: What|Removed |Added CC||msche...@redhat.com --- Comment #4 from M. Scherer --- So not wanting to say network is perfect, but this started since the upgrade to new gerrit, no ? Could it be some issue where gerrit drop if there is too much client or something like this ? -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=qp5tk9G5x8&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1564149] Agree upon a coding standard, and automate check for this in smoke
https://bugzilla.redhat.com/show_bug.cgi?id=1564149 Worker Ant changed: What|Removed |Added Status|NEW |POST --- Comment #35 from Worker Ant --- REVIEW: https://review.gluster.org/20892 (clang-format: add the config file) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=4REaz6rDQS&a=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-infra