Re: [Gluster-devel] How to enable ACL support in Glusterfs volume
On Tue, Apr 26, 2016 at 12:23 PM, Jiffin Tony Thottan wrote: > > > On 26/04/16 12:18, Jiffin Tony Thottan wrote: > > On 26/04/16 12:11, ABHISHEK PALIWAL wrote: > > Hi, > > I want to enable ACL support on gluster volume using the kernel NFS ACL > support so I have followed below steps after creation of gluster volume: > > > Is there any specific reason to knfs instead of in build gluster nfs > server ? > > > 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2 > > 2. update the /etc/exports file > /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14) > > 3. exportfs –ra > > 4. gluster volume set c_glusterfs nfs.acl off > > 5. gluster volume set c_glusterfs nfs.disable on > > we have disabled above two options because we are using Kernel NFS ACL > support and that is already enabled. > > > > on other board mounting it using > > mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/ > > setfacl -m u:application:rw /tmp/e/usr > setfacl: /tmp/e/usr: Operation not supported > > > > Can you please check the clients for the hints ? > > > What I intend to say can please check the client logs and also if possible > take the packet trace from server machine. > Yes we can check. please tell me where I can check it in /var/log/message or dmesg? > > > and application is the system user like below > > application:x:102:0::/home/application:/bin/sh > > I don't why I am getting this failure when I enabled all the acl support > in each steps. > > Please let me know how can I enable this. > > Regards, > Abhishek > > > -- > Jiffin > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > > -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to enable ACL support in Glusterfs volume
On 26/04/16 12:18, Jiffin Tony Thottan wrote: On 26/04/16 12:11, ABHISHEK PALIWAL wrote: Hi, I want to enable ACL support on gluster volume using the kernel NFS ACL support so I have followed below steps after creation of gluster volume: Is there any specific reason to knfs instead of in build gluster nfs server ? 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2 2.update the /etc/exports file /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14) 3.exportfs –ra 4.gluster volume set c_glusterfs nfs.acl off 5.gluster volume set c_glusterfs nfs.disable on we have disabled above two options because we are using Kernel NFS ACL support and that is already enabled. on other board mounting it using mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/ setfacl -m u:application:rw /tmp/e/usr setfacl: /tmp/e/usr: Operation not supported Can you please check the clients for the hints ? What I intend to say can please check the client logs and also if possible take the packet trace from server machine. and application is the system user like below application:x:102:0::/home/application:/bin/sh I don't why I am getting this failure when I enabled all the acl support in each steps. Please let me know how can I enable this. Regards, Abhishek -- Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to enable ACL support in Glusterfs volume
On Tue, Apr 26, 2016 at 12:18 PM, Jiffin Tony Thottan wrote: > On 26/04/16 12:11, ABHISHEK PALIWAL wrote: > > Hi, > > I want to enable ACL support on gluster volume using the kernel NFS ACL > support so I have followed below steps after creation of gluster volume: > > > Is there any specific reason to knfs instead of in build gluster nfs > server ? > Yes, because we have other NFS mounted volume as well in system. > > > 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2 > > 2. update the /etc/exports file > /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14) > > 3. exportfs –ra > > 4. gluster volume set c_glusterfs nfs.acl off > > 5. gluster volume set c_glusterfs nfs.disable on > > we have disabled above two options because we are using Kernel NFS ACL > support and that is already enabled. > > > > on other board mounting it using > > mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/ > > setfacl -m u:application:rw /tmp/e/usr > setfacl: /tmp/e/usr: Operation not supported > > > > Can you please check the clients for the hints ? > What I need to check here? > > and application is the system user like below > > application:x:102:0::/home/application:/bin/sh > > I don't why I am getting this failure when I enabled all the acl support > in each steps. > > Please let me know how can I enable this. > > Regards, > Abhishek > > > -- > Jiffin > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > > -- Regards Abhishek Paliwal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to enable ACL support in Glusterfs volume
On 26/04/16 12:11, ABHISHEK PALIWAL wrote: Hi, I want to enable ACL support on gluster volume using the kernel NFS ACL support so I have followed below steps after creation of gluster volume: Is there any specific reason to knfs instead of in build gluster nfs server ? 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2 2.update the /etc/exports file /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14) 3.exportfs –ra 4.gluster volume set c_glusterfs nfs.acl off 5.gluster volume set c_glusterfs nfs.disable on we have disabled above two options because we are using Kernel NFS ACL support and that is already enabled. on other board mounting it using mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/ setfacl -m u:application:rw /tmp/e/usr setfacl: /tmp/e/usr: Operation not supported Can you please check the clients for the hints ? and application is the system user like below application:x:102:0::/home/application:/bin/sh I don't why I am getting this failure when I enabled all the acl support in each steps. Please let me know how can I enable this. Regards, Abhishek -- Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] How to enable ACL support in Glusterfs volume
Hi, I want to enable ACL support on gluster volume using the kernel NFS ACL support so I have followed below steps after creation of gluster volume: 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2 2. update the /etc/exports file /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14) 3. exportfs –ra 4. gluster volume set c_glusterfs nfs.acl off 5. gluster volume set c_glusterfs nfs.disable on we have disabled above two options because we are using Kernel NFS ACL support and that is already enabled. on other board mounting it using mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/ setfacl -m u:application:rw /tmp/e/usr setfacl: /tmp/e/usr: Operation not supported and application is the system user like below application:x:102:0::/home/application:/bin/sh I don't why I am getting this failure when I enabled all the acl support in each steps. Please let me know how can I enable this. Regards, Abhishek ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] WORM patch review for 3.8
Hi Folks, Please review the WORM/Retention patch by Karthik, so that we can have it in 3.8 http://review.gluster.org/#/c/13429/ Regards, Joe ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] pNFS server for FreeBSD using GlusterFS
- Original Message - > CCing ganesha list > > On 22/04/16 04:18, Rick Macklem wrote: > > Jiffin Tony Thottan wrote: > >> > >> On 21/04/16 04:43, Rick Macklem wrote: > >>> Hi, > >>> > >>> Just to let you know, I did find the email responses to my > >>> queries some months ago helpful and I now have a pNFS server > >>> for FreeBSD using the GlusterFS port at the alpha test stage. > >>> So far I have not made any changes to GlusterFS except the little > >>> poll() patch that was already discussed on this list last December. > >>> > >>> Anyhow, if anyone is interested in taking a look at this, > >>> I have a primitive document at: > >>> http://people.freebsd.org/~rmacklem/pnfs-setup.txt > >>> that will hopefully give you a starting point. > >>> > >>> Thanks to everyone that helped via email a few months ago, rick > >> Hi Rick, > >> > >> Awesome some work man. You have cracked Flexfile layout for gluster > >> volume. > >> > >> I still wondering why you picked knfs instead of nfs-ganesha? > > I don't believe that ganesha will be ported to FreeBSD any time soon. If it > > I believe the support is already there. CCing ganesha list to confirm > the same. > Well, here is a snippet from the v2.1 release notes. It mentions FreeBSD support that is being removed. Later versions (the current is v2.3) have no mention of FreeBSD, so I assume they dropped it as planned. (Maybe I should have said "won't be ported again any time soon" instead of "will be ported ...any time soon".): The primary platform for NFS-Ganesha is Linux. Any kernel later than 2.6.39 is required to fully support the VFS FSAL. This requirement does not apply to configurations using other FSALs. We have not recently tested with kernels older than 3.8 but that should not be a problem for users with currently supported Linux distributions. There are build time options and source code in the codebase that would indicate FreeBSD support. However, the server takes advantage of some advanced capabilities of the threads model in Linux kernels that are not available on FreeBSD. FreeBSD support will probably be dropped as of V2.2 because there is no current active development of equivalents for FreeBSD. rick > > is ported, that would be an alternative for FreeBSD users to consider. > > (I work on the kernel nfsd as a hobby, so I probably wouldn't do this > > myself.) > > > >> There will > >> lot of context switches > >> between kernel space and user space which may effect the metadata > >> performance. > > Yes, I do see a lot of context switches. > > > > rick > > > >> I still remembering the discussion[1] in which I mentioned to use > >> ganesha server as MDS. > >> And usually gluster volume won't export using knfs. > >> > >> -- > >> Jiffin > >> > >>> ___ > >>> Gluster-devel mailing list > >>> Gluster-devel@gluster.org > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > >> > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote: > Well, slaves go into offline, and should be woken up when needed. > However it seems that Jenkins fails to connect to many slaves :-/ Nothing new here. I tracked this kind of toruble with NetBSD slaves and only got frustration as the result. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
On Mon, Apr 25, 2016 at 01:21:17PM +0200, Michael Scherer wrote: > Le lundi 25 avril 2016 à 13:09 +0200, Niels de Vos a écrit : > > On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote: > > > Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit : > > > > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit : > > > > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > > > > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > > > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur > > > > > > > > wrote: > > > > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > > > > > > wrote: > > > > > > > > >> Hi all, > > > > > > > > >> > > > > > > > > >> Noticed our regression machines are reporting back really > > > > > > > > >> slow, > > > > > > > > >> especially CentOs and Smoke > > > > > > > > >> > > > > > > > > >> I found that most of the slaves are marked offline, this > > > > > > > > >> could be the > > > > > > > > >> biggest reasons ? > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > Regression machines are scheduled to be offline if there are > > > > > > > > > no active > > > > > > > > > jobs. I wonder if the slowness is related to LVM or related > > > > > > > > > factors as > > > > > > > > > detailed in a recent thread? > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail > > > > > > > > shortcut) > > > > > > > > > > > > > > > > Hi Vijay, > > > > > > > > > > > > > > > > Honestly I was not aware of this case where the machines move to > > > > > > > > offline state by them self, I was only aware that they just go > > > > > > > > to idle > > > > > > > > state, > > > > > > > > Thanks for sharing that information. But we still need to > > > > > > > > reclaim most > > > > > > > > of machines, Here are the reasons why each of them are offline. > > > > > > > > > > > > > > Well, slaves go into offline, and should be woken up when needed. > > > > > > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > > > > > > > > > > > I've rebooted: > > > > > > > > > > > > > > - slave46 > > > > > > > - slave28 > > > > > > > - slave26 > > > > > > > - slave25 > > > > > > > - slave24 > > > > > > > - slave23 > > > > > > > - slave21 > > > > > > > > > > > > > > These all seem to have come up correctly after clicking the > > > > > > > 'Lauch slave > > > > > > > agent' button on the slave's status page. > > > > > > > > > > > > > > Remember that anyone with a Jankins account can reboot VMs. This > > > > > > > most > > > > > > > often is sufficient to get them working again. Just go to > > > > > > > https://build.gluster.org/job/reboot-vm/ , login and press some > > > > > > > buttons. > > > > > > > > > > > > > > One slave is in a weird status, maybe one of the tests overwrote > > > > > > > the ssh > > > > > > > key? > > > > > > > > > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > > > > > > slave29.cloud.gluster.org:22. > > > > > > > ERROR: Failed to authenticate as jenkins. Wrong password. > > > > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > > > > > > [04/24/16 06:48:04] [SSH] Authentication failed. > > > > > > > hudson.AbortException: Authentication failed. > > > > > > > at > > > > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > > > > > > at > > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > > > > > > at > > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > > > > > > at > > > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > > > > at > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > > > at > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > > > > > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > > > > > > > > > > > Leaving slave29 as is, maybe one of our admins can have a look > > > > > > > and see > > > > > > > if it needs reprovisioning. > > > > > > > > > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no > > > > > > longer > > > > > > in salt configuration, but I could connect as root. > > > > > > > > > > > > It should work better now, but please tell me if anything is > > > > > > incorrect > > > > > > with it. > > > > > > > > > > Hmm, not really. Launching the Jenkins slave agent in it through the > > > > > webui still fails the same: > > > > > > > > > > https://build.gluster.org/computer/slave29.cloud.gluster.
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
Le lundi 25 avril 2016 à 13:09 +0200, Niels de Vos a écrit : > On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote: > > Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit : > > > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit : > > > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > > > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur > > > > > > > wrote: > > > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > > > > > wrote: > > > > > > > >> Hi all, > > > > > > > >> > > > > > > > >> Noticed our regression machines are reporting back really slow, > > > > > > > >> especially CentOs and Smoke > > > > > > > >> > > > > > > > >> I found that most of the slaves are marked offline, this could > > > > > > > >> be the > > > > > > > >> biggest reasons ? > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > Regression machines are scheduled to be offline if there are no > > > > > > > > active > > > > > > > > jobs. I wonder if the slowness is related to LVM or related > > > > > > > > factors as > > > > > > > > detailed in a recent thread? > > > > > > > > > > > > > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail > > > > > > > shortcut) > > > > > > > > > > > > > > Hi Vijay, > > > > > > > > > > > > > > Honestly I was not aware of this case where the machines move to > > > > > > > offline state by them self, I was only aware that they just go to > > > > > > > idle > > > > > > > state, > > > > > > > Thanks for sharing that information. But we still need to reclaim > > > > > > > most > > > > > > > of machines, Here are the reasons why each of them are offline. > > > > > > > > > > > > Well, slaves go into offline, and should be woken up when needed. > > > > > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > > > > > > > > > I've rebooted: > > > > > > > > > > > > - slave46 > > > > > > - slave28 > > > > > > - slave26 > > > > > > - slave25 > > > > > > - slave24 > > > > > > - slave23 > > > > > > - slave21 > > > > > > > > > > > > These all seem to have come up correctly after clicking the 'Lauch > > > > > > slave > > > > > > agent' button on the slave's status page. > > > > > > > > > > > > Remember that anyone with a Jankins account can reboot VMs. This > > > > > > most > > > > > > often is sufficient to get them working again. Just go to > > > > > > https://build.gluster.org/job/reboot-vm/ , login and press some > > > > > > buttons. > > > > > > > > > > > > One slave is in a weird status, maybe one of the tests overwrote > > > > > > the ssh > > > > > > key? > > > > > > > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > > > > > slave29.cloud.gluster.org:22. > > > > > > ERROR: Failed to authenticate as jenkins. Wrong password. > > > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > > > > > [04/24/16 06:48:04] [SSH] Authentication failed. > > > > > > hudson.AbortException: Authentication failed. > > > > > > at > > > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > > > > > at > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > > > > > at > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > > > > > at > > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > > at > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > > > > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > > > > > > > > > Leaving slave29 as is, maybe one of our admins can have a look and > > > > > > see > > > > > > if it needs reprovisioning. > > > > > > > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no > > > > > longer > > > > > in salt configuration, but I could connect as root. > > > > > > > > > > It should work better now, but please tell me if anything is incorrect > > > > > with it. > > > > > > > > Hmm, not really. Launching the Jenkins slave agent in it through the > > > > webui still fails the same: > > > > > > > > https://build.gluster.org/computer/slave29.cloud.gluster.org/log > > > > > > > > Maybe the "jenkins" user on the slave has the wrong password? > > > > > > So, it seems first that he had the wrong host key, so I changed that. > > > > > > I am looking at what is wrong, so do not put it offline :) > > > > So the script to update the /etc/hosts file
[Gluster-devel] BUG: libgfapi mem leaks
Hi guys. I'm testing gluster 3.7.X from some time. Still, there is a memory leak using libgfapi (latest 3.7.11). Simplest C (without establishing connection) code: #include #include #include #include #include #include int main (int argc, char** argv) { glfs_t *fs = NULL; fs = glfs_new ("pool"); if (!fs) { fprintf (stderr, "glfs_new: returned NULL\n"); return 1; } glfs_fini (fs); return 0; } Produces memory leak (valgrind output below). I believe it is a serious matter, which prevents using gluster in production. Things like libvirt or qemu will grow in process size until limit is reached and stop working. Libvirt for example, before launching domain, checks if vm's image is present and if image is accessible via libgfapi, libvirt grows in size because of this leak. If gluster is to be considered enterprise grade - this has to be resolved. Adding another glfs_new() and glfs_fini() section in the code above, increases memory leak. Best regards Piotr Rybicki # valgrind --leak-check=full --show-reachable=yes --show-leak-kinds=all ./a.out ==1689== Memcheck, a memory error detector ==1689== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==1689== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==1689== Command: ./a.out ==1689== ==1689== ==1689== HEAP SUMMARY: ==1689== in use at exit: 9,076,453 bytes in 57 blocks ==1689== total heap usage: 141 allocs, 84 frees, 18,089,145 bytes allocated ==1689== ==1689== 8 bytes in 1 blocks are still reachable in loss record 1 of 55 ==1689==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==1689==by 0x5A8D916: __gf_default_calloc (mem-pool.h:118) ==1689==by 0x5A8D916: __glusterfs_this_location (globals.c:146) ==1689==by 0x4E3D76C: glfs_new@@GFAPI_3.4.0 (glfs.c:699) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 82 bytes in 1 blocks are definitely lost in loss record 2 of 55 ==1689==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711) ==1689==by 0x5A893F9: __gf_calloc (mem-pool.c:117) ==1689==by 0x5A57254: gf_strdup (mem-pool.h:185) ==1689==by 0x5A57254: gf_log_init (logging.c:738) ==1689==by 0x4E3DB25: glfs_set_logging@@GFAPI_3.4.0 (glfs.c:837) ==1689==by 0x4E3D7BC: glfs_new@@GFAPI_3.4.0 (glfs.c:712) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 89 bytes in 1 blocks are possibly lost in loss record 3 of 55 ==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299) ==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142) ==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221) ==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239) ==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364) ==1689==by 0x4E3CA47: glusterfs_ctx_defaults_init (glfs.c:133) ==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655) ==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 89 bytes in 1 blocks are possibly lost in loss record 4 of 55 ==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299) ==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142) ==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221) ==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239) ==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364) ==1689==by 0x4E3CA93: glusterfs_ctx_defaults_init (glfs.c:142) ==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655) ==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 92 bytes in 1 blocks are possibly lost in loss record 5 of 55 ==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299) ==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142) ==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221) ==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239) ==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364) ==1689==by 0x4E3CAB9: glusterfs_ctx_defaults_init (glfs.c:146) ==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655) ==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 94 bytes in 1 blocks are possibly lost in loss record 6 of 55 ==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299) ==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142) ==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221) ==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239) ==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364) ==1689==by 0x4E3CA21: glusterfs_ctx_defaults_init (glfs.c:128) ==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655) ==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700) ==1689==by 0x4007E6: main (in /root/gf-test2/a.out) ==1689== ==1689== 94 bytes in 1 blocks are possibly lost in loss record 7 of 55 ==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299) ==1689
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote: > Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit : > > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit : > > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur > > > > > > wrote: > > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > > > > wrote: > > > > > > >> Hi all, > > > > > > >> > > > > > > >> Noticed our regression machines are reporting back really slow, > > > > > > >> especially CentOs and Smoke > > > > > > >> > > > > > > >> I found that most of the slaves are marked offline, this could > > > > > > >> be the > > > > > > >> biggest reasons ? > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > Regression machines are scheduled to be offline if there are no > > > > > > > active > > > > > > > jobs. I wonder if the slowness is related to LVM or related > > > > > > > factors as > > > > > > > detailed in a recent thread? > > > > > > > > > > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail > > > > > > shortcut) > > > > > > > > > > > > Hi Vijay, > > > > > > > > > > > > Honestly I was not aware of this case where the machines move to > > > > > > offline state by them self, I was only aware that they just go to > > > > > > idle > > > > > > state, > > > > > > Thanks for sharing that information. But we still need to reclaim > > > > > > most > > > > > > of machines, Here are the reasons why each of them are offline. > > > > > > > > > > Well, slaves go into offline, and should be woken up when needed. > > > > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > > > > > > > I've rebooted: > > > > > > > > > > - slave46 > > > > > - slave28 > > > > > - slave26 > > > > > - slave25 > > > > > - slave24 > > > > > - slave23 > > > > > - slave21 > > > > > > > > > > These all seem to have come up correctly after clicking the 'Lauch > > > > > slave > > > > > agent' button on the slave's status page. > > > > > > > > > > Remember that anyone with a Jankins account can reboot VMs. This most > > > > > often is sufficient to get them working again. Just go to > > > > > https://build.gluster.org/job/reboot-vm/ , login and press some > > > > > buttons. > > > > > > > > > > One slave is in a weird status, maybe one of the tests overwrote the > > > > > ssh > > > > > key? > > > > > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > > > > slave29.cloud.gluster.org:22. > > > > > ERROR: Failed to authenticate as jenkins. Wrong password. > > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > > > > [04/24/16 06:48:04] [SSH] Authentication failed. > > > > > hudson.AbortException: Authentication failed. > > > > > at > > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > > > > at > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > > > > at > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > > > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > > > > > > > Leaving slave29 as is, maybe one of our admins can have a look and see > > > > > if it needs reprovisioning. > > > > > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer > > > > in salt configuration, but I could connect as root. > > > > > > > > It should work better now, but please tell me if anything is incorrect > > > > with it. > > > > > > Hmm, not really. Launching the Jenkins slave agent in it through the > > > webui still fails the same: > > > > > > https://build.gluster.org/computer/slave29.cloud.gluster.org/log > > > > > > Maybe the "jenkins" user on the slave has the wrong password? > > > > So, it seems first that he had the wrong host key, so I changed that. > > > > I am looking at what is wrong, so do not put it offline :) > > So the script to update the /etc/hosts file was not run, so it was using > the wrong ip. > > Can we agree on getting ride of it now, since there is no need for it > anymore ? I guess so, DNS should be stable now, right? > (then i will also remove the /etc/rax-reboot file from the various > slaves, and maybe replace with a ansible based system) rax-reboot is only needed on build.gluster.org, none of the other
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit : > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit : > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur > > > > > wrote: > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > > > wrote: > > > > > >> Hi all, > > > > > >> > > > > > >> Noticed our regression machines are reporting back really slow, > > > > > >> especially CentOs and Smoke > > > > > >> > > > > > >> I found that most of the slaves are marked offline, this could be > > > > > >> the > > > > > >> biggest reasons ? > > > > > >> > > > > > >> > > > > > > > > > > > > Regression machines are scheduled to be offline if there are no > > > > > > active > > > > > > jobs. I wonder if the slowness is related to LVM or related factors > > > > > > as > > > > > > detailed in a recent thread? > > > > > > > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail > > > > > shortcut) > > > > > > > > > > Hi Vijay, > > > > > > > > > > Honestly I was not aware of this case where the machines move to > > > > > offline state by them self, I was only aware that they just go to idle > > > > > state, > > > > > Thanks for sharing that information. But we still need to reclaim most > > > > > of machines, Here are the reasons why each of them are offline. > > > > > > > > Well, slaves go into offline, and should be woken up when needed. > > > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > > > > > I've rebooted: > > > > > > > > - slave46 > > > > - slave28 > > > > - slave26 > > > > - slave25 > > > > - slave24 > > > > - slave23 > > > > - slave21 > > > > > > > > These all seem to have come up correctly after clicking the 'Lauch slave > > > > agent' button on the slave's status page. > > > > > > > > Remember that anyone with a Jankins account can reboot VMs. This most > > > > often is sufficient to get them working again. Just go to > > > > https://build.gluster.org/job/reboot-vm/ , login and press some buttons. > > > > > > > > One slave is in a weird status, maybe one of the tests overwrote the ssh > > > > key? > > > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > > > slave29.cloud.gluster.org:22. > > > > ERROR: Failed to authenticate as jenkins. Wrong password. > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > > > [04/24/16 06:48:04] [SSH] Authentication failed. > > > > hudson.AbortException: Authentication failed. > > > > at > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > > > at > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > > > at > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > at java.lang.Thread.run(Thread.java:745) > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > > > > > Leaving slave29 as is, maybe one of our admins can have a look and see > > > > if it needs reprovisioning. > > > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer > > > in salt configuration, but I could connect as root. > > > > > > It should work better now, but please tell me if anything is incorrect > > > with it. > > > > Hmm, not really. Launching the Jenkins slave agent in it through the > > webui still fails the same: > > > > https://build.gluster.org/computer/slave29.cloud.gluster.org/log > > > > Maybe the "jenkins" user on the slave has the wrong password? > > So, it seems first that he had the wrong host key, so I changed that. > > I am looking at what is wrong, so do not put it offline :) So the script to update the /etc/hosts file was not run, so it was using the wrong ip. Can we agree on getting ride of it now, since there is no need for it anymore ? (then i will also remove the /etc/rax-reboot file from the various slaves, and maybe replace with a ansible based system) -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Should it be possible to disable own-thread for encrypted RPC?
On Thu, Apr 21, 2016 at 5:37 PM, Jeff Darcy wrote: >> I've recently become aware of another problem with own-threads. The >> threads launched are not reaped, pthread_joined, after a TLS >> connection disconnects. This is especially problematic with GlusterD >> as it launches a lot of threads to handle generally short lived >> connections (volfile fetch, portmapper). This causes GlusterDs mem >> usage to continually grow, and finally lead to other failures due to >> memory shortage. I've recently seen a setup with GlusterD memory >> usage in 10s of GBs of reserved mem and TBs of virt mem. This is >> easily reproducible as well. I'm still working out a solution for >> this. >> >> While allowing TLS connections with own-threads only will lead to a >> more stable experience, this is a really bad in terms of our memory >> consumption. This will badly affect our chances of having 1000s of >> clients. Making TLS work with epoll would fix this, but I'm not very >> sure of the effort involved. Could we fix this for 3.8? For 4.0, if >> we want to default to TLS, we definitely need to fix this. > > Maybe it's just my not-so-humble opinion, but reaping threads seems > like a pretty easy thing to implement. By contrast, the prospects It is easier to get the threads reaped, and this is what I intend to do for the next 3.7.x release and 3.8. The simplest solution I can think of right now is to have a reaper timer run periodically, which would reap any TLS own-threads that have stopped. The process of reaping would be as follows, - The reaper timer is started when the 1st TLS own-thread is created. It wakes up every X seconds and reaps dead threads. - TLS own-threads need to notify the reaper timer of their demise. This is achieved by pushing their thread-ids to a global queue when they exit. - When the reaper timer is triggered, it reads in the thread-ids from the queue and calls pthread_join on them. This should work well. But I'm not sure if this is the simplest way to do the reaping. What do you think of this? > of making TLS (specifically OpenSSL) work reliably with epoll seem > murky at best. Nothing has been easy with epoll so far, and I don't > see why we'd expect making it work reliably with OpenSSL's horrible > API would be the first exception. Fixing one small issue with > own-thread still seems like the quickest route to a stable TLS > implementation. While TLS will get more robust by fixing the problems with own-thread, I'm still concerned with the memory usage for Gluster-4.0. Particularly because we're aiming to use TLS by default and have brick multiplexing. This could lead to situations with a single process launching 1000s of threads to handle TLS connections, which will lead to large memory footprint for Gluster. This is my reasoning for trying to get TLS work with epoll. I may be overthinking this, and this might not be of any significance at all. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit : > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur > > > > wrote: > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > > wrote: > > > > >> Hi all, > > > > >> > > > > >> Noticed our regression machines are reporting back really slow, > > > > >> especially CentOs and Smoke > > > > >> > > > > >> I found that most of the slaves are marked offline, this could be the > > > > >> biggest reasons ? > > > > >> > > > > >> > > > > > > > > > > Regression machines are scheduled to be offline if there are no active > > > > > jobs. I wonder if the slowness is related to LVM or related factors as > > > > > detailed in a recent thread? > > > > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut) > > > > > > > > Hi Vijay, > > > > > > > > Honestly I was not aware of this case where the machines move to > > > > offline state by them self, I was only aware that they just go to idle > > > > state, > > > > Thanks for sharing that information. But we still need to reclaim most > > > > of machines, Here are the reasons why each of them are offline. > > > > > > Well, slaves go into offline, and should be woken up when needed. > > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > > > I've rebooted: > > > > > > - slave46 > > > - slave28 > > > - slave26 > > > - slave25 > > > - slave24 > > > - slave23 > > > - slave21 > > > > > > These all seem to have come up correctly after clicking the 'Lauch slave > > > agent' button on the slave's status page. > > > > > > Remember that anyone with a Jankins account can reboot VMs. This most > > > often is sufficient to get them working again. Just go to > > > https://build.gluster.org/job/reboot-vm/ , login and press some buttons. > > > > > > One slave is in a weird status, maybe one of the tests overwrote the ssh > > > key? > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > > slave29.cloud.gluster.org:22. > > > ERROR: Failed to authenticate as jenkins. Wrong password. > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > > [04/24/16 06:48:04] [SSH] Authentication failed. > > > hudson.AbortException: Authentication failed. > > > at > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > > at > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > > at > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > at java.lang.Thread.run(Thread.java:745) > > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > > > Leaving slave29 as is, maybe one of our admins can have a look and see > > > if it needs reprovisioning. > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer > > in salt configuration, but I could connect as root. > > > > It should work better now, but please tell me if anything is incorrect > > with it. > > Hmm, not really. Launching the Jenkins slave agent in it through the > webui still fails the same: > > https://build.gluster.org/computer/slave29.cloud.gluster.org/log > > Maybe the "jenkins" user on the slave has the wrong password? So, it seems first that he had the wrong host key, so I changed that. I am looking at what is wrong, so do not put it offline :) -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote: > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur wrote: > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > > wrote: > > > >> Hi all, > > > >> > > > >> Noticed our regression machines are reporting back really slow, > > > >> especially CentOs and Smoke > > > >> > > > >> I found that most of the slaves are marked offline, this could be the > > > >> biggest reasons ? > > > >> > > > >> > > > > > > > > Regression machines are scheduled to be offline if there are no active > > > > jobs. I wonder if the slowness is related to LVM or related factors as > > > > detailed in a recent thread? > > > > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut) > > > > > > Hi Vijay, > > > > > > Honestly I was not aware of this case where the machines move to > > > offline state by them self, I was only aware that they just go to idle > > > state, > > > Thanks for sharing that information. But we still need to reclaim most > > > of machines, Here are the reasons why each of them are offline. > > > > Well, slaves go into offline, and should be woken up when needed. > > However it seems that Jenkins fails to connect to many slaves :-/ > > > > I've rebooted: > > > > - slave46 > > - slave28 > > - slave26 > > - slave25 > > - slave24 > > - slave23 > > - slave21 > > > > These all seem to have come up correctly after clicking the 'Lauch slave > > agent' button on the slave's status page. > > > > Remember that anyone with a Jankins account can reboot VMs. This most > > often is sufficient to get them working again. Just go to > > https://build.gluster.org/job/reboot-vm/ , login and press some buttons. > > > > One slave is in a weird status, maybe one of the tests overwrote the ssh > > key? > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > > slave29.cloud.gluster.org:22. > > ERROR: Failed to authenticate as jenkins. Wrong password. > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > > [04/24/16 06:48:04] [SSH] Authentication failed. > > hudson.AbortException: Authentication failed. > > at > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > > at > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > > at > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > [04/24/16 06:48:04] Launch failed - cleaning up connection > > [04/24/16 06:48:05] [SSH] Connection closed. > > > > Leaving slave29 as is, maybe one of our admins can have a look and see > > if it needs reprovisioning. > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer > in salt configuration, but I could connect as root. > > It should work better now, but please tell me if anything is incorrect > with it. Hmm, not really. Launching the Jenkins slave agent in it through the webui still fails the same: https://build.gluster.org/computer/slave29.cloud.gluster.org/log Maybe the "jenkins" user on the slave has the wrong password? Thanks, Niels signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit : > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote: > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur wrote: > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever > > > wrote: > > >> Hi all, > > >> > > >> Noticed our regression machines are reporting back really slow, > > >> especially CentOs and Smoke > > >> > > >> I found that most of the slaves are marked offline, this could be the > > >> biggest reasons ? > > >> > > >> > > > > > > Regression machines are scheduled to be offline if there are no active > > > jobs. I wonder if the slowness is related to LVM or related factors as > > > detailed in a recent thread? > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut) > > > > Hi Vijay, > > > > Honestly I was not aware of this case where the machines move to > > offline state by them self, I was only aware that they just go to idle > > state, > > Thanks for sharing that information. But we still need to reclaim most > > of machines, Here are the reasons why each of them are offline. > > Well, slaves go into offline, and should be woken up when needed. > However it seems that Jenkins fails to connect to many slaves :-/ > > I've rebooted: > > - slave46 > - slave28 > - slave26 > - slave25 > - slave24 > - slave23 > - slave21 > > These all seem to have come up correctly after clicking the 'Lauch slave > agent' button on the slave's status page. > > Remember that anyone with a Jankins account can reboot VMs. This most > often is sufficient to get them working again. Just go to > https://build.gluster.org/job/reboot-vm/ , login and press some buttons. > > One slave is in a weird status, maybe one of the tests overwrote the ssh > key? > > [04/24/16 06:48:02] [SSH] Opening SSH connection to > slave29.cloud.gluster.org:22. > ERROR: Failed to authenticate as jenkins. Wrong password. > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password) > [04/24/16 06:48:04] [SSH] Authentication failed. > hudson.AbortException: Authentication failed. > at > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217) > at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711) > at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > [04/24/16 06:48:04] Launch failed - cleaning up connection > [04/24/16 06:48:05] [SSH] Connection closed. > > Leaving slave29 as is, maybe one of our admins can have a look and see > if it needs reprovisioning. Seems slave29 was reinstalled and/or slightly damaged, it was no longer in salt configuration, but I could connect as root. It should work better now, but please tell me if anything is incorrect with it. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] pNFS server for FreeBSD using GlusterFS
CCing ganesha list On 22/04/16 04:18, Rick Macklem wrote: Jiffin Tony Thottan wrote: On 21/04/16 04:43, Rick Macklem wrote: Hi, Just to let you know, I did find the email responses to my queries some months ago helpful and I now have a pNFS server for FreeBSD using the GlusterFS port at the alpha test stage. So far I have not made any changes to GlusterFS except the little poll() patch that was already discussed on this list last December. Anyhow, if anyone is interested in taking a look at this, I have a primitive document at: http://people.freebsd.org/~rmacklem/pnfs-setup.txt that will hopefully give you a starting point. Thanks to everyone that helped via email a few months ago, rick Hi Rick, Awesome some work man. You have cracked Flexfile layout for gluster volume. I still wondering why you picked knfs instead of nfs-ganesha? I don't believe that ganesha will be ported to FreeBSD any time soon. If it I believe the support is already there. CCing ganesha list to confirm the same. is ported, that would be an alternative for FreeBSD users to consider. (I work on the kernel nfsd as a hobby, so I probably wouldn't do this myself.) There will lot of context switches between kernel space and user space which may effect the metadata performance. Yes, I do see a lot of context switches. rick I still remembering the discussion[1] in which I mentioned to use ganesha server as MDS. And usually gluster volume won't export using knfs. -- Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel