Re: [Gluster-devel] Jenkins Issues this weekend and how we're solving them

2018-02-18 Thread Atin Mukherjee
On Mon, Feb 19, 2018 at 8:53 AM, Nigel Babu  wrote:

> Hello,
>
> As you all most likely know, we store the tarball of the binaries and core
> if there's a core during regression. Occasionally, we've introduced a bug
> in Gluster and this tar can take up a lot of space. This has happened
> recently with brick multiplex tests. The build-install tar takes up 25G,
> causing the machine to run out of space and continuously fail.
>

AFAIK, we don't have a .t file in upstream regression suits where hundreds
of volumes are created. With that scale and brick multiplexing enabled, I
can understand the core will be quite heavy loaded and may consume up to
this much of crazy amount of space. FWIW, can we first try to figure out
which test was causing this crash and see if running a gcore after a
certain steps in the tests do left us with a similar size of the core file?
IOW, have we actually seen such huge size of core file generated earlier?
If not, what changed because which we've started seeing this is something
to be invested on.


>
> I've made some changes this morning. Right after we create the tarball,
> we'll delete all files in /archive that are greater than 1G. Please be
> aware that this means all large files including the newly created tarball
> will be deleted. You will have to work with the traceback on the Jenkins
> job.
>

We'd really need to first investigate on the average size of the core file
what we can get with when a system is running with brick multiplexing and
ongoing I/O. With out that immediately deleting the core files > 1G will
cause trouble to the developers in debugging genuine crashes as traceback
alone may not be sufficient.


>
>
>
> --
> nigelb
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Jenkins Issues this weekend and how we're solving them

2018-02-18 Thread Nigel Babu
Hello,

As you all most likely know, we store the tarball of the binaries and core
if there's a core during regression. Occasionally, we've introduced a bug
in Gluster and this tar can take up a lot of space. This has happened
recently with brick multiplex tests. The build-install tar takes up 25G,
causing the machine to run out of space and continuously fail.

I've made some changes this morning. Right after we create the tarball,
we'll delete all files in /archive that are greater than 1G. Please be
aware that this means all large files including the newly created tarball
will be deleted. You will have to work with the traceback on the Jenkins
job.




-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Weekly Untriaged Bugs

2018-02-18 Thread jenkins
[...truncated 6 lines...]
https://bugzilla.redhat.com/1536908 / build: gluster-block build as "fatal 
error: api/glfs.h: No such file or directory"
https://bugzilla.redhat.com/1541261 / build: "glustereventsd-SuSE.in" is 
missing in extras/init.d
https://bugzilla.redhat.com/1544851 / build: Redefinitions of IXDR_GET_LONG and 
IXDR_PUT_LONG when libtirpc is used
https://bugzilla.redhat.com/1545048 / core: [brick-mux] process termination 
race while killing glusterfsd on last brick detach
https://bugzilla.redhat.com/1545142 / core: GlusterFS - Memory Leak during 
adding directories
https://bugzilla.redhat.com/1544090 / core: possible memleak in glusterfsd 
process with brick multiplexing on
https://bugzilla.redhat.com/1540882 / disperse: Do lock conflict check 
correctly for wait-list
https://bugzilla.redhat.com/1543585 / fuse: Client Memory Usage Drastically 
Increased from 3.12 to 3.13 for Replicate 3 Volumes
https://bugzilla.redhat.com/1537602 / geo-replication: Georeplication tests 
intermittently fail
https://bugzilla.redhat.com/1539657 / geo-replication: Georeplication tests 
intermittently fail
https://bugzilla.redhat.com/1542979 / geo-replication: glibc fix for 
CVE-2018-101 breaks geo-replication
https://bugzilla.redhat.com/1544461 / glusterd: 3.8 -> 3.10 rolling upgrade 
fails (same for 3.12 or 3.13) on Ubuntu 14
https://bugzilla.redhat.com/1544638 / glusterd: 3.8 -> 3.10 rolling upgrade 
fails (same for 3.12 or 3.13) on Ubuntu 14
https://bugzilla.redhat.com/1540249 / glusterd: Gluster is trying to use a port 
outside documentation and firewalld's glusterfs.xml
https://bugzilla.redhat.com/1540868 / glusterd: Volume start commit failed, 
Commit failed for operation Start on local node
https://bugzilla.redhat.com/1536952 / project-infrastructure: build: add 
libcurl package to regression machines
https://bugzilla.redhat.com/1545003 / project-infrastructure: Create a new 
list: automated-test...@gluster.org
https://bugzilla.redhat.com/1544378 / project-infrastructure: mailman list 
moderation redirects from https to http
https://bugzilla.redhat.com/1546040 / project-infrastructure: Need centos 
machine to validate all test cases while brick mux is on
https://bugzilla.redhat.com/1545891 / project-infrastructure: Provide a 
automated way to update bugzilla status with patch merge.
https://bugzilla.redhat.com/1538900 / protocol: Found a missing unref in 
rpc_clnt_reconnect
https://bugzilla.redhat.com/1540478 / quota: Change quota option of many 
volumes concurrently, some commit operation failed.
https://bugzilla.redhat.com/1539680 / rdma: RDMA transport bricks crash
https://bugzilla.redhat.com/1544961 / rpc: libgfrpc does not export IPv6 RPC 
methods even with --with-ipv6-default
https://bugzilla.redhat.com/1546295 / rpc: Official packages don't default to 
IPv6
https://bugzilla.redhat.com/1538978 / rpc: rpcsvc_request_handler thread should 
be made multithreaded
https://bugzilla.redhat.com/1542934 / rpc: Seeing timer errors in the rebalance 
logs
https://bugzilla.redhat.com/1542072 / scripts: Syntactical errors in hook 
scripts for managing SELinux context on bricks #2 (S10selinux-label-brick.sh + 
S10selinux-del-fcontext.sh)
https://bugzilla.redhat.com/1540759 / tiering: Failure to demote tiered volume 
file that is continuously modified by client during hot tier detachment.
https://bugzilla.redhat.com/1540376 / tiering: Tiered volume performance 
degrades badly after a volume stop/start or system restart.
[...truncated 2 lines...]

build.log
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel