Re: [Gluster-devel] How to enable ACL support in Glusterfs volume

2016-04-25 Thread ABHISHEK PALIWAL
On Tue, Apr 26, 2016 at 12:23 PM, Jiffin Tony Thottan 
wrote:

>
>
> On 26/04/16 12:18, Jiffin Tony Thottan wrote:
>
> On 26/04/16 12:11, ABHISHEK PALIWAL wrote:
>
> Hi,
>
> I want to enable ACL support on gluster volume using the kernel NFS ACL
> support so I have followed below steps after creation of gluster volume:
>
>
> Is there any specific reason to knfs instead of in build gluster nfs
> server ?
>
>
> 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2
>
> 2.   update the /etc/exports file
> /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14)
>
> 3.   exportfs –ra
>
> 4.   gluster volume set c_glusterfs nfs.acl off
>
> 5.   gluster volume set c_glusterfs nfs.disable on
>
> we have disabled above two options because we are using Kernel NFS ACL
> support and that is already enabled.
>
>
>
> on other board mounting it using
>
> mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/
>
> setfacl -m u:application:rw /tmp/e/usr
> setfacl: /tmp/e/usr: Operation not supported
>
>
>
> Can you please check the clients for the hints ?
>
>
> What I intend to say can please check the client logs and also if possible
> take the packet trace from server machine.
>

Yes we can check. please tell me where I can check it in /var/log/message
or dmesg?


>
>
> and application is the system user like below
>
> application:x:102:0::/home/application:/bin/sh
>
> I don't why I am getting this failure when I enabled all the acl support
> in each steps.
>
> Please let me know how can I enable this.
>
> Regards,
> Abhishek
>
>
> --
> Jiffin
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>


-- 




Regards
Abhishek Paliwal
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to enable ACL support in Glusterfs volume

2016-04-25 Thread Jiffin Tony Thottan



On 26/04/16 12:18, Jiffin Tony Thottan wrote:

On 26/04/16 12:11, ABHISHEK PALIWAL wrote:

Hi,
I want to enable ACL support on gluster volume using the kernel NFS 
ACL support so I have followed below steps after creation of gluster 
volume:


Is there any specific reason to knfs instead of in build gluster nfs 
server ?



1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2
2.update the /etc/exports file
/tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14)
3.exportfs –ra
4.gluster volume set c_glusterfs nfs.acl off
5.gluster volume set c_glusterfs nfs.disable on
we have disabled above two options because we are using Kernel NFS 
ACL support and that is already enabled.

on other board mounting it using
mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/
setfacl -m u:application:rw /tmp/e/usr
setfacl: /tmp/e/usr: Operation not supported


Can you please check the clients for the hints ?


What I intend to say can please check the client logs and also if 
possible take the packet trace from server machine.





and application is the system user like below
application:x:102:0::/home/application:/bin/sh

I don't why I am getting this failure when I enabled all the acl 
support in each steps.


Please let me know how can I enable this.

Regards,
Abhishek



--
Jiffin



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to enable ACL support in Glusterfs volume

2016-04-25 Thread ABHISHEK PALIWAL
On Tue, Apr 26, 2016 at 12:18 PM, Jiffin Tony Thottan 
wrote:

> On 26/04/16 12:11, ABHISHEK PALIWAL wrote:
>
> Hi,
>
> I want to enable ACL support on gluster volume using the kernel NFS ACL
> support so I have followed below steps after creation of gluster volume:
>
>
> Is there any specific reason to knfs instead of in build gluster nfs
> server ?
>
Yes, because we have other NFS mounted volume as well in system.

>
>
> 1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2
>
> 2.   update the /etc/exports file
> /tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14)
>
> 3.   exportfs –ra
>
> 4.   gluster volume set c_glusterfs nfs.acl off
>
> 5.   gluster volume set c_glusterfs nfs.disable on
>
> we have disabled above two options because we are using Kernel NFS ACL
> support and that is already enabled.
>
>
>
> on other board mounting it using
>
> mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/
>
> setfacl -m u:application:rw /tmp/e/usr
> setfacl: /tmp/e/usr: Operation not supported
>
>
>
> Can you please check the clients for the hints ?
>
What I need to check here?

>
> and application is the system user like below
>
> application:x:102:0::/home/application:/bin/sh
>
> I don't why I am getting this failure when I enabled all the acl support
> in each steps.
>
> Please let me know how can I enable this.
>
> Regards,
> Abhishek
>
>
> --
> Jiffin
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>


-- 




Regards
Abhishek Paliwal
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to enable ACL support in Glusterfs volume

2016-04-25 Thread Jiffin Tony Thottan

On 26/04/16 12:11, ABHISHEK PALIWAL wrote:

Hi,
I want to enable ACL support on gluster volume using the kernel NFS 
ACL support so I have followed below steps after creation of gluster 
volume:


Is there any specific reason to knfs instead of in build gluster nfs 
server ?



1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2
2.update the /etc/exports file
/tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14)
3.exportfs –ra
4.gluster volume set c_glusterfs nfs.acl off
5.gluster volume set c_glusterfs nfs.disable on
we have disabled above two options because we are using Kernel NFS ACL 
support and that is already enabled.

on other board mounting it using
mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/
setfacl -m u:application:rw /tmp/e/usr
setfacl: /tmp/e/usr: Operation not supported


Can you please check the clients for the hints ?


and application is the system user like below
application:x:102:0::/home/application:/bin/sh

I don't why I am getting this failure when I enabled all the acl 
support in each steps.


Please let me know how can I enable this.

Regards,
Abhishek



--
Jiffin



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] How to enable ACL support in Glusterfs volume

2016-04-25 Thread ABHISHEK PALIWAL
 Hi,

I want to enable ACL support on gluster volume using the kernel NFS ACL
support so I have followed below steps after creation of gluster volume:

1. mount -t glusterfs -o acl 10.32.0.48:/c_glusterfs /tmp/a2

2.   update the /etc/exports file
/tmp/a2 10.32.*(rw,acl,sync,no_subtree_check,no_root_squash,fsid=14)

3.   exportfs –ra

4.   gluster volume set c_glusterfs nfs.acl off

5.   gluster volume set c_glusterfs nfs.disable on

we have disabled above two options because we are using Kernel NFS ACL
support and that is already enabled.



on other board mounting it using

mount -t nfs -o acl,vers=3 10.32.0.48:/tmp/a2 /tmp/e/

setfacl -m u:application:rw /tmp/e/usr
setfacl: /tmp/e/usr: Operation not supported

and application is the system user like below

application:x:102:0::/home/application:/bin/sh

I don't why I am getting this failure when I enabled all the acl support in
each steps.

Please let me know how can I enable this.

Regards,
Abhishek
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] WORM patch review for 3.8

2016-04-25 Thread Joseph Fernandes
Hi Folks,

Please review the WORM/Retention patch by Karthik, so that we can have it in 3.8

http://review.gluster.org/#/c/13429/

Regards,
Joe
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] pNFS server for FreeBSD using GlusterFS

2016-04-25 Thread Rick Macklem


- Original Message -
> CCing ganesha list
> 
> On 22/04/16 04:18, Rick Macklem wrote:
> > Jiffin Tony Thottan wrote:
> >>
> >> On 21/04/16 04:43, Rick Macklem wrote:
> >>> Hi,
> >>>
> >>> Just to let you know, I did find the email responses to my
> >>> queries some months ago helpful and I now have a pNFS server
> >>> for FreeBSD using the GlusterFS port at the alpha test stage.
> >>> So far I have not made any changes to GlusterFS except the little
> >>> poll() patch that was already discussed on this list last December.
> >>>
> >>> Anyhow, if anyone is interested in taking a look at this,
> >>> I have a primitive document at:
> >>> http://people.freebsd.org/~rmacklem/pnfs-setup.txt
> >>> that will hopefully give you a starting point.
> >>>
> >>> Thanks to everyone that helped via email a few months ago, rick
> >> Hi Rick,
> >>
> >> Awesome some work man. You have cracked Flexfile layout for gluster
> >> volume.
> >>
> >> I still wondering why you picked knfs instead of nfs-ganesha?
> > I don't believe that ganesha will be ported to FreeBSD any time soon. If it
> 
> I believe the support is already there. CCing ganesha list to confirm
> the same.
> 
Well, here is a snippet from the v2.1 release notes. It mentions FreeBSD support
that is being removed. Later versions (the current is v2.3) have no mention of
FreeBSD, so I assume they dropped it as planned. (Maybe I should have said
"won't be ported again any time soon" instead of "will be ported ...any time 
soon".):

  The primary platform for NFS-Ganesha is Linux. Any kernel later than 2.6.39 
is required to fully support the VFS FSAL. This requirement does not apply to
  configurations using other FSALs. We have not recently tested with kernels 
older than 3.8 but that should not be a problem for users with currently 
supported
  Linux distributions.

  There are build time options and source code in the codebase that would 
indicate FreeBSD support. However, the server takes advantage of some
  advanced capabilities of the threads model in Linux kernels that are not 
available on FreeBSD. FreeBSD support will probably be dropped
  as of V2.2 because there is no current active development of equivalents for 
FreeBSD.

rick

> > is ported, that would be an alternative for FreeBSD users to consider.
> > (I work on the kernel nfsd as a hobby, so I probably wouldn't do this
> > myself.)
> >
> >> There will
> >> lot of context switches
> >> between kernel space and user space which may effect the metadata
> >> performance.
> > Yes, I do see a lot of context switches.
> >
> > rick
> >
> >> I still remembering the discussion[1] in which I mentioned to use
> >> ganesha server as MDS.
> >> And usually gluster volume won't export using knfs.
> >>
> >> --
> >> Jiffin
> >>
> >>> ___
> >>> Gluster-devel mailing list
> >>> Gluster-devel@gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Emmanuel Dreyfus
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote:
> Well, slaves go into offline, and should be woken up when needed.
> However it seems that Jenkins fails to connect to many slaves :-/

Nothing new here. I tracked this kind of toruble with NetBSD slaves
and only got frustration as the result.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Niels de Vos
On Mon, Apr 25, 2016 at 01:21:17PM +0200, Michael Scherer wrote:
> Le lundi 25 avril 2016 à 13:09 +0200, Niels de Vos a écrit :
> > On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote:
> > > Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit :
> > > > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit :
> > > > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> > > > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > > > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur 
> > > > > > > >  wrote:
> > > > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever 
> > > > > > > > >  wrote:
> > > > > > > > >> Hi all,
> > > > > > > > >>
> > > > > > > > >> Noticed our regression machines are reporting back really 
> > > > > > > > >> slow,
> > > > > > > > >> especially CentOs and Smoke
> > > > > > > > >>
> > > > > > > > >> I found that most of the slaves are marked offline, this 
> > > > > > > > >> could be the
> > > > > > > > >> biggest reasons ?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > Regression machines are scheduled to be offline if there are 
> > > > > > > > > no active
> > > > > > > > > jobs. I wonder if the slowness is related to LVM or related 
> > > > > > > > > factors as
> > > > > > > > > detailed in a recent thread?
> > > > > > > > >
> > > > > > > > 
> > > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail 
> > > > > > > > shortcut)
> > > > > > > > 
> > > > > > > > Hi Vijay,
> > > > > > > > 
> > > > > > > > Honestly I was not aware of this case where the machines move to
> > > > > > > > offline state by them self, I was only aware that they just go 
> > > > > > > > to idle
> > > > > > > > state,
> > > > > > > > Thanks for sharing that information. But we still need to 
> > > > > > > > reclaim most
> > > > > > > > of machines, Here are the reasons why each of them are offline.
> > > > > > > 
> > > > > > > Well, slaves go into offline, and should be woken up when needed.
> > > > > > > However it seems that Jenkins fails to connect to many slaves :-/
> > > > > > > 
> > > > > > > I've rebooted:
> > > > > > > 
> > > > > > >  - slave46
> > > > > > >  - slave28
> > > > > > >  - slave26
> > > > > > >  - slave25
> > > > > > >  - slave24
> > > > > > >  - slave23
> > > > > > >  - slave21
> > > > > > > 
> > > > > > > These all seem to have come up correctly after clicking the 
> > > > > > > 'Lauch slave
> > > > > > > agent' button on the slave's status page.
> > > > > > > 
> > > > > > > Remember that anyone with a Jankins account can reboot VMs. This 
> > > > > > > most
> > > > > > > often is sufficient to get them working again. Just go to
> > > > > > > https://build.gluster.org/job/reboot-vm/ , login and press some 
> > > > > > > buttons.
> > > > > > > 
> > > > > > > One slave is in a weird status, maybe one of the tests overwrote 
> > > > > > > the ssh
> > > > > > > key?
> > > > > > > 
> > > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > > > > > > slave29.cloud.gluster.org:22.
> > > > > > > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > > > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > > > > > > [04/24/16 06:48:04] [SSH] Authentication failed.
> > > > > > > hudson.AbortException: Authentication failed.
> > > > > > >   at 
> > > > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > > > > > >   at 
> > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > > > > > >   at 
> > > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > > > > > >   at 
> > > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > > > > >   at 
> > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > >   at 
> > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > >   at java.lang.Thread.run(Thread.java:745)
> > > > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > > > > > > [04/24/16 06:48:05] [SSH] Connection closed.
> > > > > > > 
> > > > > > > Leaving slave29 as is, maybe one of our admins can have a look 
> > > > > > > and see
> > > > > > > if it needs reprovisioning.
> > > > > > 
> > > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no 
> > > > > > longer
> > > > > > in salt configuration, but I could connect as root. 
> > > > > > 
> > > > > > It should work better now, but please tell me if anything is 
> > > > > > incorrect
> > > > > > with it.
> > > > > 
> > > > > Hmm, not really. Launching the Jenkins slave agent in it through the
> > > > > webui still fails the same:
> > > > > 
> > > > >   https://build.gluster.org/computer/slave29.cloud.gluster.

Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Michael Scherer
Le lundi 25 avril 2016 à 13:09 +0200, Niels de Vos a écrit :
> On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote:
> > Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit :
> > > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit :
> > > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> > > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur 
> > > > > > >  wrote:
> > > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever 
> > > > > > > >  wrote:
> > > > > > > >> Hi all,
> > > > > > > >>
> > > > > > > >> Noticed our regression machines are reporting back really slow,
> > > > > > > >> especially CentOs and Smoke
> > > > > > > >>
> > > > > > > >> I found that most of the slaves are marked offline, this could 
> > > > > > > >> be the
> > > > > > > >> biggest reasons ?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > > Regression machines are scheduled to be offline if there are no 
> > > > > > > > active
> > > > > > > > jobs. I wonder if the slowness is related to LVM or related 
> > > > > > > > factors as
> > > > > > > > detailed in a recent thread?
> > > > > > > >
> > > > > > > 
> > > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail 
> > > > > > > shortcut)
> > > > > > > 
> > > > > > > Hi Vijay,
> > > > > > > 
> > > > > > > Honestly I was not aware of this case where the machines move to
> > > > > > > offline state by them self, I was only aware that they just go to 
> > > > > > > idle
> > > > > > > state,
> > > > > > > Thanks for sharing that information. But we still need to reclaim 
> > > > > > > most
> > > > > > > of machines, Here are the reasons why each of them are offline.
> > > > > > 
> > > > > > Well, slaves go into offline, and should be woken up when needed.
> > > > > > However it seems that Jenkins fails to connect to many slaves :-/
> > > > > > 
> > > > > > I've rebooted:
> > > > > > 
> > > > > >  - slave46
> > > > > >  - slave28
> > > > > >  - slave26
> > > > > >  - slave25
> > > > > >  - slave24
> > > > > >  - slave23
> > > > > >  - slave21
> > > > > > 
> > > > > > These all seem to have come up correctly after clicking the 'Lauch 
> > > > > > slave
> > > > > > agent' button on the slave's status page.
> > > > > > 
> > > > > > Remember that anyone with a Jankins account can reboot VMs. This 
> > > > > > most
> > > > > > often is sufficient to get them working again. Just go to
> > > > > > https://build.gluster.org/job/reboot-vm/ , login and press some 
> > > > > > buttons.
> > > > > > 
> > > > > > One slave is in a weird status, maybe one of the tests overwrote 
> > > > > > the ssh
> > > > > > key?
> > > > > > 
> > > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > > > > > slave29.cloud.gluster.org:22.
> > > > > > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > > > > > [04/24/16 06:48:04] [SSH] Authentication failed.
> > > > > > hudson.AbortException: Authentication failed.
> > > > > > at 
> > > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > > > > > at 
> > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > > > > > at 
> > > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > > > > > at 
> > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > > > > at 
> > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > at 
> > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > > > > > [04/24/16 06:48:05] [SSH] Connection closed.
> > > > > > 
> > > > > > Leaving slave29 as is, maybe one of our admins can have a look and 
> > > > > > see
> > > > > > if it needs reprovisioning.
> > > > > 
> > > > > Seems slave29 was reinstalled and/or slightly damaged, it was no 
> > > > > longer
> > > > > in salt configuration, but I could connect as root. 
> > > > > 
> > > > > It should work better now, but please tell me if anything is incorrect
> > > > > with it.
> > > > 
> > > > Hmm, not really. Launching the Jenkins slave agent in it through the
> > > > webui still fails the same:
> > > > 
> > > >   https://build.gluster.org/computer/slave29.cloud.gluster.org/log
> > > > 
> > > > Maybe the "jenkins" user on the slave has the wrong password?
> > > 
> > > So, it seems first that he had the wrong host key, so I changed that. 
> > > 
> > > I am looking at what is wrong, so do not put it offline :)
> > 
> > So the script to update the /etc/hosts file 

[Gluster-devel] BUG: libgfapi mem leaks

2016-04-25 Thread Piotr Rybicki

Hi guys.

I'm testing gluster 3.7.X from some time.

Still, there is a memory leak using libgfapi (latest 3.7.11).

Simplest C (without establishing connection) code:

#include 
#include 
#include 
#include 
#include 
#include 

int main (int argc, char** argv) {
glfs_t *fs = NULL;

fs = glfs_new ("pool");
if (!fs) {
  fprintf (stderr, "glfs_new: returned NULL\n");
  return 1;
}

glfs_fini (fs);
return 0;
}

Produces memory leak (valgrind output below).

I believe it is a serious matter, which prevents using gluster in 
production. Things like libvirt or qemu will grow in process size until 
limit is reached and stop working. Libvirt for example, before launching 
domain, checks if vm's image is present and if image is accessible via 
libgfapi, libvirt grows in size because of this leak.


If gluster is to be considered enterprise grade - this has to be resolved.

Adding another glfs_new() and glfs_fini() section in the code above, 
increases memory leak.


Best regards
Piotr Rybicki

#  valgrind --leak-check=full --show-reachable=yes --show-leak-kinds=all 
./a.out

==1689== Memcheck, a memory error detector
==1689== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1689== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==1689== Command: ./a.out
==1689==
==1689==
==1689== HEAP SUMMARY:
==1689== in use at exit: 9,076,453 bytes in 57 blocks
==1689==   total heap usage: 141 allocs, 84 frees, 18,089,145 bytes 
allocated

==1689==
==1689== 8 bytes in 1 blocks are still reachable in loss record 1 of 55
==1689==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==1689==by 0x5A8D916: __gf_default_calloc (mem-pool.h:118)
==1689==by 0x5A8D916: __glusterfs_this_location (globals.c:146)
==1689==by 0x4E3D76C: glfs_new@@GFAPI_3.4.0 (glfs.c:699)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 82 bytes in 1 blocks are definitely lost in loss record 2 of 55
==1689==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==1689==by 0x5A893F9: __gf_calloc (mem-pool.c:117)
==1689==by 0x5A57254: gf_strdup (mem-pool.h:185)
==1689==by 0x5A57254: gf_log_init (logging.c:738)
==1689==by 0x4E3DB25: glfs_set_logging@@GFAPI_3.4.0 (glfs.c:837)
==1689==by 0x4E3D7BC: glfs_new@@GFAPI_3.4.0 (glfs.c:712)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 89 bytes in 1 blocks are possibly lost in loss record 3 of 55
==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299)
==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142)
==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221)
==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239)
==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364)
==1689==by 0x4E3CA47: glusterfs_ctx_defaults_init (glfs.c:133)
==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655)
==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 89 bytes in 1 blocks are possibly lost in loss record 4 of 55
==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299)
==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142)
==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221)
==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239)
==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364)
==1689==by 0x4E3CA93: glusterfs_ctx_defaults_init (glfs.c:142)
==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655)
==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 92 bytes in 1 blocks are possibly lost in loss record 5 of 55
==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299)
==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142)
==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221)
==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239)
==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364)
==1689==by 0x4E3CAB9: glusterfs_ctx_defaults_init (glfs.c:146)
==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655)
==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 94 bytes in 1 blocks are possibly lost in loss record 6 of 55
==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299)
==1689==by 0x5A8952D: __gf_malloc (mem-pool.c:142)
==1689==by 0x5A89851: gf_vasprintf (mem-pool.c:221)
==1689==by 0x5A89943: gf_asprintf (mem-pool.c:239)
==1689==by 0x5A89B4F: mem_pool_new_fn (mem-pool.c:364)
==1689==by 0x4E3CA21: glusterfs_ctx_defaults_init (glfs.c:128)
==1689==by 0x4E3D886: glfs_init_global_ctx (glfs.c:655)
==1689==by 0x4E3D886: glfs_new@@GFAPI_3.4.0 (glfs.c:700)
==1689==by 0x4007E6: main (in /root/gf-test2/a.out)
==1689==
==1689== 94 bytes in 1 blocks are possibly lost in loss record 7 of 55
==1689==at 0x4C29FE0: malloc (vg_replace_malloc.c:299)
==1689

Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Niels de Vos
On Mon, Apr 25, 2016 at 11:58:56AM +0200, Michael Scherer wrote:
> Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit :
> > Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit :
> > > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> > > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur  
> > > > > > wrote:
> > > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever 
> > > > > > >  wrote:
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> Noticed our regression machines are reporting back really slow,
> > > > > > >> especially CentOs and Smoke
> > > > > > >>
> > > > > > >> I found that most of the slaves are marked offline, this could 
> > > > > > >> be the
> > > > > > >> biggest reasons ?
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > > Regression machines are scheduled to be offline if there are no 
> > > > > > > active
> > > > > > > jobs. I wonder if the slowness is related to LVM or related 
> > > > > > > factors as
> > > > > > > detailed in a recent thread?
> > > > > > >
> > > > > > 
> > > > > > Sorry, the previous mail was sent incomplete (blame some Gmail 
> > > > > > shortcut)
> > > > > > 
> > > > > > Hi Vijay,
> > > > > > 
> > > > > > Honestly I was not aware of this case where the machines move to
> > > > > > offline state by them self, I was only aware that they just go to 
> > > > > > idle
> > > > > > state,
> > > > > > Thanks for sharing that information. But we still need to reclaim 
> > > > > > most
> > > > > > of machines, Here are the reasons why each of them are offline.
> > > > > 
> > > > > Well, slaves go into offline, and should be woken up when needed.
> > > > > However it seems that Jenkins fails to connect to many slaves :-/
> > > > > 
> > > > > I've rebooted:
> > > > > 
> > > > >  - slave46
> > > > >  - slave28
> > > > >  - slave26
> > > > >  - slave25
> > > > >  - slave24
> > > > >  - slave23
> > > > >  - slave21
> > > > > 
> > > > > These all seem to have come up correctly after clicking the 'Lauch 
> > > > > slave
> > > > > agent' button on the slave's status page.
> > > > > 
> > > > > Remember that anyone with a Jankins account can reboot VMs. This most
> > > > > often is sufficient to get them working again. Just go to
> > > > > https://build.gluster.org/job/reboot-vm/ , login and press some 
> > > > > buttons.
> > > > > 
> > > > > One slave is in a weird status, maybe one of the tests overwrote the 
> > > > > ssh
> > > > > key?
> > > > > 
> > > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > > > > slave29.cloud.gluster.org:22.
> > > > > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > > > > [04/24/16 06:48:04] [SSH] Authentication failed.
> > > > > hudson.AbortException: Authentication failed.
> > > > >   at 
> > > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > > > >   at 
> > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > > > >   at 
> > > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > > > >   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > > >   at 
> > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > >   at 
> > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > >   at java.lang.Thread.run(Thread.java:745)
> > > > > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > > > > [04/24/16 06:48:05] [SSH] Connection closed.
> > > > > 
> > > > > Leaving slave29 as is, maybe one of our admins can have a look and see
> > > > > if it needs reprovisioning.
> > > > 
> > > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer
> > > > in salt configuration, but I could connect as root. 
> > > > 
> > > > It should work better now, but please tell me if anything is incorrect
> > > > with it.
> > > 
> > > Hmm, not really. Launching the Jenkins slave agent in it through the
> > > webui still fails the same:
> > > 
> > >   https://build.gluster.org/computer/slave29.cloud.gluster.org/log
> > > 
> > > Maybe the "jenkins" user on the slave has the wrong password?
> > 
> > So, it seems first that he had the wrong host key, so I changed that. 
> > 
> > I am looking at what is wrong, so do not put it offline :)
> 
> So the script to update the /etc/hosts file was not run, so it was using
> the wrong ip.
> 
> Can we agree on getting ride of it now, since there is no need for it
> anymore ?

I guess so, DNS should be stable now, right?

> (then i will also remove the /etc/rax-reboot file from the various
> slaves, and maybe replace with a ansible based system)

rax-reboot is only needed on build.gluster.org, none of the other

Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Michael Scherer
Le lundi 25 avril 2016 à 11:26 +0200, Michael Scherer a écrit :
> Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit :
> > On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> > > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur  
> > > > > wrote:
> > > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever 
> > > > > >  wrote:
> > > > > >> Hi all,
> > > > > >>
> > > > > >> Noticed our regression machines are reporting back really slow,
> > > > > >> especially CentOs and Smoke
> > > > > >>
> > > > > >> I found that most of the slaves are marked offline, this could be 
> > > > > >> the
> > > > > >> biggest reasons ?
> > > > > >>
> > > > > >>
> > > > > >
> > > > > > Regression machines are scheduled to be offline if there are no 
> > > > > > active
> > > > > > jobs. I wonder if the slowness is related to LVM or related factors 
> > > > > > as
> > > > > > detailed in a recent thread?
> > > > > >
> > > > > 
> > > > > Sorry, the previous mail was sent incomplete (blame some Gmail 
> > > > > shortcut)
> > > > > 
> > > > > Hi Vijay,
> > > > > 
> > > > > Honestly I was not aware of this case where the machines move to
> > > > > offline state by them self, I was only aware that they just go to idle
> > > > > state,
> > > > > Thanks for sharing that information. But we still need to reclaim most
> > > > > of machines, Here are the reasons why each of them are offline.
> > > > 
> > > > Well, slaves go into offline, and should be woken up when needed.
> > > > However it seems that Jenkins fails to connect to many slaves :-/
> > > > 
> > > > I've rebooted:
> > > > 
> > > >  - slave46
> > > >  - slave28
> > > >  - slave26
> > > >  - slave25
> > > >  - slave24
> > > >  - slave23
> > > >  - slave21
> > > > 
> > > > These all seem to have come up correctly after clicking the 'Lauch slave
> > > > agent' button on the slave's status page.
> > > > 
> > > > Remember that anyone with a Jankins account can reboot VMs. This most
> > > > often is sufficient to get them working again. Just go to
> > > > https://build.gluster.org/job/reboot-vm/ , login and press some buttons.
> > > > 
> > > > One slave is in a weird status, maybe one of the tests overwrote the ssh
> > > > key?
> > > > 
> > > > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > > > slave29.cloud.gluster.org:22.
> > > > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > > > [04/24/16 06:48:04] [SSH] Authentication failed.
> > > > hudson.AbortException: Authentication failed.
> > > > at 
> > > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > > > at 
> > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > > > at 
> > > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > > at 
> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > at 
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > at java.lang.Thread.run(Thread.java:745)
> > > > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > > > [04/24/16 06:48:05] [SSH] Connection closed.
> > > > 
> > > > Leaving slave29 as is, maybe one of our admins can have a look and see
> > > > if it needs reprovisioning.
> > > 
> > > Seems slave29 was reinstalled and/or slightly damaged, it was no longer
> > > in salt configuration, but I could connect as root. 
> > > 
> > > It should work better now, but please tell me if anything is incorrect
> > > with it.
> > 
> > Hmm, not really. Launching the Jenkins slave agent in it through the
> > webui still fails the same:
> > 
> >   https://build.gluster.org/computer/slave29.cloud.gluster.org/log
> > 
> > Maybe the "jenkins" user on the slave has the wrong password?
> 
> So, it seems first that he had the wrong host key, so I changed that. 
> 
> I am looking at what is wrong, so do not put it offline :)

So the script to update the /etc/hosts file was not run, so it was using
the wrong ip.

Can we agree on getting ride of it now, since there is no need for it
anymore ?

(then i will also remove the /etc/rax-reboot file from the various
slaves, and maybe replace with a ansible based system)
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Should it be possible to disable own-thread for encrypted RPC?

2016-04-25 Thread Kaushal M
On Thu, Apr 21, 2016 at 5:37 PM, Jeff Darcy  wrote:
>> I've recently become aware of another problem with own-threads. The
>> threads launched are not reaped, pthread_joined, after a TLS
>> connection disconnects.  This is especially problematic with GlusterD
>> as it launches a lot of threads to handle generally short lived
>> connections (volfile fetch, portmapper).  This causes GlusterDs mem
>> usage to continually grow, and finally lead to other failures due to
>> memory shortage.  I've recently seen a setup with GlusterD memory
>> usage in 10s of GBs of reserved mem and TBs of virt mem. This is
>> easily reproducible as well.  I'm still working out a solution for
>> this.
>>
>> While allowing TLS connections with own-threads only will lead to a
>> more stable experience, this is a really bad in terms of our memory
>> consumption.  This will badly affect our chances of having 1000s of
>> clients. Making TLS work with epoll would fix this, but I'm not very
>> sure of the effort involved.  Could we fix this for 3.8? For 4.0, if
>> we want to default to TLS, we definitely need to fix this.
>
> Maybe it's just my not-so-humble opinion, but reaping threads seems
> like a pretty easy thing to implement.  By contrast, the prospects

It is easier to get the threads reaped, and this is what I intend to
do for the next 3.7.x release and 3.8.

The simplest solution I can think of right now is to have a reaper
timer run periodically, which would reap any TLS own-threads that have
stopped.
The process of reaping would be as follows,

- The reaper timer is started when the 1st TLS own-thread is created.
It wakes up every X seconds and reaps dead threads.
- TLS own-threads need to notify the reaper timer of their demise.
This is achieved by pushing their thread-ids to a global queue when
they exit.
- When the reaper timer is triggered, it reads in the thread-ids from
the queue and calls pthread_join on them.

This should work well. But I'm not sure if this is the simplest way to
do the reaping. What do you think of this?

> of making TLS (specifically OpenSSL) work reliably with epoll seem
> murky at best.  Nothing has been easy with epoll so far, and I don't
> see why we'd expect making it work reliably with OpenSSL's horrible
> API would be the first exception.  Fixing one small issue with
> own-thread still seems like the quickest route to a stable TLS
> implementation.

While TLS will get more robust by fixing the problems with own-thread,
I'm still concerned with the memory usage for Gluster-4.0.
Particularly because we're aiming to use TLS by default and have brick
multiplexing.
This could lead to situations with a single process launching 1000s of
threads to handle TLS connections,
which will lead to large memory footprint for Gluster. This is my
reasoning for trying to get TLS work with epoll.
I may be overthinking this, and this might not be of any significance at all.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Michael Scherer
Le lundi 25 avril 2016 à 11:12 +0200, Niels de Vos a écrit :
> On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> > Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur  
> > > > wrote:
> > > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever 
> > > > >  wrote:
> > > > >> Hi all,
> > > > >>
> > > > >> Noticed our regression machines are reporting back really slow,
> > > > >> especially CentOs and Smoke
> > > > >>
> > > > >> I found that most of the slaves are marked offline, this could be the
> > > > >> biggest reasons ?
> > > > >>
> > > > >>
> > > > >
> > > > > Regression machines are scheduled to be offline if there are no active
> > > > > jobs. I wonder if the slowness is related to LVM or related factors as
> > > > > detailed in a recent thread?
> > > > >
> > > > 
> > > > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut)
> > > > 
> > > > Hi Vijay,
> > > > 
> > > > Honestly I was not aware of this case where the machines move to
> > > > offline state by them self, I was only aware that they just go to idle
> > > > state,
> > > > Thanks for sharing that information. But we still need to reclaim most
> > > > of machines, Here are the reasons why each of them are offline.
> > > 
> > > Well, slaves go into offline, and should be woken up when needed.
> > > However it seems that Jenkins fails to connect to many slaves :-/
> > > 
> > > I've rebooted:
> > > 
> > >  - slave46
> > >  - slave28
> > >  - slave26
> > >  - slave25
> > >  - slave24
> > >  - slave23
> > >  - slave21
> > > 
> > > These all seem to have come up correctly after clicking the 'Lauch slave
> > > agent' button on the slave's status page.
> > > 
> > > Remember that anyone with a Jankins account can reboot VMs. This most
> > > often is sufficient to get them working again. Just go to
> > > https://build.gluster.org/job/reboot-vm/ , login and press some buttons.
> > > 
> > > One slave is in a weird status, maybe one of the tests overwrote the ssh
> > > key?
> > > 
> > > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > > slave29.cloud.gluster.org:22.
> > > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > > [04/24/16 06:48:04] [SSH] Authentication failed.
> > > hudson.AbortException: Authentication failed.
> > >   at 
> > > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > >   at 
> > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > >   at 
> > > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > >   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > >   at 
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >   at 
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >   at java.lang.Thread.run(Thread.java:745)
> > > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > > [04/24/16 06:48:05] [SSH] Connection closed.
> > > 
> > > Leaving slave29 as is, maybe one of our admins can have a look and see
> > > if it needs reprovisioning.
> > 
> > Seems slave29 was reinstalled and/or slightly damaged, it was no longer
> > in salt configuration, but I could connect as root. 
> > 
> > It should work better now, but please tell me if anything is incorrect
> > with it.
> 
> Hmm, not really. Launching the Jenkins slave agent in it through the
> webui still fails the same:
> 
>   https://build.gluster.org/computer/slave29.cloud.gluster.org/log
> 
> Maybe the "jenkins" user on the slave has the wrong password?

So, it seems first that he had the wrong host key, so I changed that. 

I am looking at what is wrong, so do not put it offline :)

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Niels de Vos
On Mon, Apr 25, 2016 at 10:43:13AM +0200, Michael Scherer wrote:
> Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> > On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur  wrote:
> > > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever  
> > > > wrote:
> > > >> Hi all,
> > > >>
> > > >> Noticed our regression machines are reporting back really slow,
> > > >> especially CentOs and Smoke
> > > >>
> > > >> I found that most of the slaves are marked offline, this could be the
> > > >> biggest reasons ?
> > > >>
> > > >>
> > > >
> > > > Regression machines are scheduled to be offline if there are no active
> > > > jobs. I wonder if the slowness is related to LVM or related factors as
> > > > detailed in a recent thread?
> > > >
> > > 
> > > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut)
> > > 
> > > Hi Vijay,
> > > 
> > > Honestly I was not aware of this case where the machines move to
> > > offline state by them self, I was only aware that they just go to idle
> > > state,
> > > Thanks for sharing that information. But we still need to reclaim most
> > > of machines, Here are the reasons why each of them are offline.
> > 
> > Well, slaves go into offline, and should be woken up when needed.
> > However it seems that Jenkins fails to connect to many slaves :-/
> > 
> > I've rebooted:
> > 
> >  - slave46
> >  - slave28
> >  - slave26
> >  - slave25
> >  - slave24
> >  - slave23
> >  - slave21
> > 
> > These all seem to have come up correctly after clicking the 'Lauch slave
> > agent' button on the slave's status page.
> > 
> > Remember that anyone with a Jankins account can reboot VMs. This most
> > often is sufficient to get them working again. Just go to
> > https://build.gluster.org/job/reboot-vm/ , login and press some buttons.
> > 
> > One slave is in a weird status, maybe one of the tests overwrote the ssh
> > key?
> > 
> > [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> > slave29.cloud.gluster.org:22.
> > ERROR: Failed to authenticate as jenkins. Wrong password. 
> > (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> > [04/24/16 06:48:04] [SSH] Authentication failed.
> > hudson.AbortException: Authentication failed.
> > at 
> > hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
> > at 
> > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
> > at 
> > hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > [04/24/16 06:48:04] Launch failed - cleaning up connection
> > [04/24/16 06:48:05] [SSH] Connection closed.
> > 
> > Leaving slave29 as is, maybe one of our admins can have a look and see
> > if it needs reprovisioning.
> 
> Seems slave29 was reinstalled and/or slightly damaged, it was no longer
> in salt configuration, but I could connect as root. 
> 
> It should work better now, but please tell me if anything is incorrect
> with it.

Hmm, not really. Launching the Jenkins slave agent in it through the
webui still fails the same:

  https://build.gluster.org/computer/slave29.cloud.gluster.org/log

Maybe the "jenkins" user on the slave has the wrong password?

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Michael Scherer
Le dimanche 24 avril 2016 à 15:59 +0200, Niels de Vos a écrit :
> On Sun, Apr 24, 2016 at 04:22:55PM +0530, Prasanna Kalever wrote:
> > On Sun, Apr 24, 2016 at 7:11 AM, Vijay Bellur  wrote:
> > > On Sat, Apr 23, 2016 at 9:30 AM, Prasanna Kalever  
> > > wrote:
> > >> Hi all,
> > >>
> > >> Noticed our regression machines are reporting back really slow,
> > >> especially CentOs and Smoke
> > >>
> > >> I found that most of the slaves are marked offline, this could be the
> > >> biggest reasons ?
> > >>
> > >>
> > >
> > > Regression machines are scheduled to be offline if there are no active
> > > jobs. I wonder if the slowness is related to LVM or related factors as
> > > detailed in a recent thread?
> > >
> > 
> > Sorry, the previous mail was sent incomplete (blame some Gmail shortcut)
> > 
> > Hi Vijay,
> > 
> > Honestly I was not aware of this case where the machines move to
> > offline state by them self, I was only aware that they just go to idle
> > state,
> > Thanks for sharing that information. But we still need to reclaim most
> > of machines, Here are the reasons why each of them are offline.
> 
> Well, slaves go into offline, and should be woken up when needed.
> However it seems that Jenkins fails to connect to many slaves :-/
> 
> I've rebooted:
> 
>  - slave46
>  - slave28
>  - slave26
>  - slave25
>  - slave24
>  - slave23
>  - slave21
> 
> These all seem to have come up correctly after clicking the 'Lauch slave
> agent' button on the slave's status page.
> 
> Remember that anyone with a Jankins account can reboot VMs. This most
> often is sufficient to get them working again. Just go to
> https://build.gluster.org/job/reboot-vm/ , login and press some buttons.
> 
> One slave is in a weird status, maybe one of the tests overwrote the ssh
> key?
> 
> [04/24/16 06:48:02] [SSH] Opening SSH connection to 
> slave29.cloud.gluster.org:22.
> ERROR: Failed to authenticate as jenkins. Wrong password. 
> (credentialId:c31bff89-36c0-4f41-aed8-7c87ba53621e/method:password)
> [04/24/16 06:48:04] [SSH] Authentication failed.
> hudson.AbortException: Authentication failed.
>   at 
> hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1217)
>   at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
>   at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> [04/24/16 06:48:04] Launch failed - cleaning up connection
> [04/24/16 06:48:05] [SSH] Connection closed.
> 
> Leaving slave29 as is, maybe one of our admins can have a look and see
> if it needs reprovisioning.

Seems slave29 was reinstalled and/or slightly damaged, it was no longer
in salt configuration, but I could connect as root. 

It should work better now, but please tell me if anything is incorrect
with it.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] pNFS server for FreeBSD using GlusterFS

2016-04-25 Thread Jiffin Tony Thottan

CCing ganesha list

On 22/04/16 04:18, Rick Macklem wrote:

Jiffin Tony Thottan wrote:


On 21/04/16 04:43, Rick Macklem wrote:

Hi,

Just to let you know, I did find the email responses to my
queries some months ago helpful and I now have a pNFS server
for FreeBSD using the GlusterFS port at the alpha test stage.
So far I have not made any changes to GlusterFS except the little
poll() patch that was already discussed on this list last December.

Anyhow, if anyone is interested in taking a look at this,
I have a primitive document at:
http://people.freebsd.org/~rmacklem/pnfs-setup.txt
that will hopefully give you a starting point.

Thanks to everyone that helped via email a few months ago, rick

Hi Rick,

Awesome some work man. You have cracked Flexfile layout for gluster volume.

I still wondering why you picked knfs instead of nfs-ganesha?

I don't believe that ganesha will be ported to FreeBSD any time soon. If it


I believe the support is already there. CCing ganesha list to confirm 
the same.



is ported, that would be an alternative for FreeBSD users to consider.
(I work on the kernel nfsd as a hobby, so I probably wouldn't do this myself.)


There will
lot of context switches
between kernel space and user space which may effect the metadata
performance.

Yes, I do see a lot of context switches.

rick


I still remembering the discussion[1] in which I mentioned to use
ganesha server as MDS.
And usually gluster volume won't export using knfs.

--
Jiffin


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel