Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for 
answers (I've had more than 1400 requests ongoing). Probably there 
will be many factors that can influence on the load that this causes, 
and one of them could be ec, but it's not related exclusively to ec. 
I've repeated the test using a replica 3 and a replica 2 volumes and 
the problem still happens.


The test basically writes a file to an NFS mount using 'dd'. The file 
has a size of 1GB. With a smaller file, the test passes successfully.
Using NFS client and gluster NFS server on same machine with BIG file dd 
operations is known to cause hangs. anon-fd-quota.t used to give similar 
problems so we changed the test to not involve NFS mounts.


Pranith


One important thing to note is that I'm not using powerful servers (a 
dual core Intel Atom), but this problem shouldn't happen anyway. It 
can even happen on more powerful servers if they are busy doing other 
things (maybe this is what's happening on jenkins' slaves).


I think that this causes some NFS requests to timeout. This can be 
seen in /var/log/messages (there are many of these messages):


Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not 
responding, timed out


nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: 
NFS3, ProgVers: 3, Proc: 7) to rpc

-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 
0-nfs-nfsv3: Reply submission failed


Additionally this causes disconnections from NFS that are not 
correctly handled causing that a thread gets stuck in an infinite loop 
(I haven't analyzed this problem deeply, but it seems like an attempt 
to use an already disconnected socket). After a while, I get this 
error on the nfs log:


[2015-02-12 14:20:19.545429] C 
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: 
server 192.168.200.61:49152 has not responded in the last 42 seconds, 
disconnecting.


The console executing the test shows this (nfs.t is creating a replica 
3 instead of a dispersed volume):


# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing 
‘/mnt/nfs/0/test’: Input/output error

[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr  0.02 sys +  1.29 
cusr  3.45 csys =  4.89 CPU)

Result: FAIL
Failed tests  ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without 
having an answer ?


Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.


This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would have run with a single epoll thread
(the MT epoll patch had this hard coded, so that would have run with 2
threads, but did not show up the issue (patch:
http://review.gluster.org/#/c/3842/)).

Again I state the above, as this should not be exposing a
race/bug/problem due to the multi threaded nature of epoll, but of
course needs investigation.



I'm not sure if this patch is the cause or it has revealed some bug in
ec or any other xlator.


I guess we can reproduce this issue? If so I would try setting
client.event-threads on master branch to 1, restarting the volume and
then running the test (as a part of the test itself maybe) to eliminate
the possibility that MT epoll is causing it.

My belief on MT epoll causing it is in doubt as the runs failed on the
http://review.gluster.org/#/c/9488/ (configuration patch), which had the
thread count as 1 due to a bug in that code.



I can try to identify it (any help will be appreciated), but it may 
take

some time. Would it be better to remove the test in the meantime ?


I am checking if this is reproducible on my machine, so that I can

Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for 
answers (I've had more than 1400 requests ongoing). Probably there 
will be many factors that can influence on the load that this causes, 
and one of them could be ec, but it's not related exclusively to ec. 
I've repeated the test using a replica 3 and a replica 2 volumes and 
the problem still happens.


The test basically writes a file to an NFS mount using 'dd'. The file 
has a size of 1GB. With a smaller file, the test passes successfully.
Using NFS client and gluster NFS server on same machine with BIG file 
dd operations is known to cause hangs. anon-fd-quota.t used to give 
similar problems so we changed the test to not involve NFS mounts.
I don't re-collect the exact scenario. Avati found the deadlock of 
memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat 
raised this bug then. CCed him to the thread as well if he knows the 
exact scenario.


Pranith


Pranith


One important thing to note is that I'm not using powerful servers (a 
dual core Intel Atom), but this problem shouldn't happen anyway. It 
can even happen on more powerful servers if they are busy doing other 
things (maybe this is what's happening on jenkins' slaves).


I think that this causes some NFS requests to timeout. This can be 
seen in /var/log/messages (there are many of these messages):


Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not 
responding, timed out


nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: 
NFS3, ProgVers: 3, Proc: 7) to rpc

-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 
0-nfs-nfsv3: Reply submission failed


Additionally this causes disconnections from NFS that are not 
correctly handled causing that a thread gets stuck in an infinite 
loop (I haven't analyzed this problem deeply, but it seems like an 
attempt to use an already disconnected socket). After a while, I get 
this error on the nfs log:


[2015-02-12 14:20:19.545429] C 
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: 
server 192.168.200.61:49152 has not responded in the last 42 seconds, 
disconnecting.


The console executing the test shows this (nfs.t is creating a 
replica 3 instead of a dispersed volume):


# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing 
‘/mnt/nfs/0/test’: Input/output error

[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
Failed test: 9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr 
3.45 csys = 4.89 CPU)

Result: FAIL
Failed tests ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without 
having an answer ?


Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.


This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would have run with a single epoll 
thread

(the MT epoll patch had this hard coded, so that would have run with 2
threads, but did not show up the issue (patch:
http://review.gluster.org/#/c/3842/)).

Again I state the above, as this should not be exposing a
race/bug/problem due to the multi threaded nature of epoll, but of
course needs investigation.



I'm not sure if this patch is the cause or it has revealed some bug in
ec or any other xlator.


I guess we can reproduce this issue? If so I would try setting
client.event-threads on master branch to 1, restarting the volume and
then running the test (as a part of the test itself maybe) to eliminate
the possibility that MT epoll is causing it.

My belief on MT epoll causing it is in doubt as the runs failed on the
http://review.gluster.org/#/c/9488/ 

Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Shyam



On 02/12/2015 01:27 PM, Xavier Hernandez wrote:

On 12.02.2015 19:09, Pranith Kumar Karampuri wrote:


On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:

On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse. It
seems that NFS sends a huge amount of requests without waiting for
answers (I've had more than 1400 requests ongoing). Probably there
will be many factors that can influence on the load that this
causes, and one of them could be ec, but it's not related
exclusively to ec. I've repeated the test using a replica 3 and a
replica 2 volumes and the problem still happens. The test basically
writes a file to an NFS mount using 'dd'. The file has a size of
1GB. With a smaller file, the test passes successfully.

Using NFS client and gluster NFS server on same machine with BIG file
dd operations is known to cause hangs. anon-fd-quota.t used to give
similar problems so we changed the test to not involve NFS mounts.

I don't re-collect the exact scenario. Avati found the deadlock of
memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat
raised this bug then. CCed him to the thread as well if he knows the
exact scenario.


I've been doing some tests with Shyam and it seems that the root cause is the 
edge-triggered epoll introduced in the multi-threaded epoll patch. It has a 
side effect that makes the outstanding-rpc-limit option near to useless and 
gluster gets overflowed of requests, causing timeouts and disconnections on 
slow/busy machines.


Elaborating on this, the MT epoll makes the epoll edge triggered (ET), 
and so on an poll in event, we attempt to read as much as we can. If the 
client is able to supply 'n' RPCs till our read, gets a EAGAIN | 
EWOULDBLOCK, we will read them and not honor the server side throttle.


In the previous case, we read RPC by RPC and the epoll was not ET, hence 
when we reached the throttle limit, we stop reading from the socket. The 
network pipes would be filled up when this happens and so the client 
would also not be able to write more RPC, hence outstanding RPC (or 
ongoing RPCs) would be limited.


With the ET case in epoll we are breaking this.

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Xavier Hernandez
 

On 12.02.2015 19:09, Pranith Kumar Karampuri wrote: 

 On
02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:
 
 On 02/12/2015
08:15 PM, Xavier Hernandez wrote: 
 
 I've made some more
investigation and the problem seems worse. It seems that NFS sends a
huge amount of requests without waiting for answers (I've had more than
1400 requests ongoing). Probably there will be many factors that can
influence on the load that this causes, and one of them could be ec, but
it's not related exclusively to ec. I've repeated the test using a
replica 3 and a replica 2 volumes and the problem still happens. The
test basically writes a file to an NFS mount using 'dd'. The file has a
size of 1GB. With a smaller file, the test passes successfully.
 Using
NFS client and gluster NFS server on same machine with BIG file dd
operations is known to cause hangs. anon-fd-quota.t used to give similar
problems so we changed the test to not involve NFS mounts.
 
 I don't
re-collect the exact scenario. Avati found the deadlock of 
 memory
allocation, when I just joined gluster, in 2010. Raghavendra Bhat 

raised this bug then. CCed him to the thread as well if he knows the 

exact scenario.

I've been doing some tests with Shyam and it seems that
the root cause is the edge-triggered epoll introduced in the
multi-threaded epoll patch. It has a side effect that makes the
outstanding-rpc-limit option near to useless and gluster gets overflowed
of requests, causing timeouts and disconnections on slow/busy
machines.

I've opened bug #1192114 for this problem.

Xavi
 ___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GlusterFS 4.0 Meeting Time Survey

2015-02-12 Thread Kaushal M
+1 for Thursday (anytime).
-1 for Friday.

On Wed, Feb 11, 2015 at 7:22 PM, Niels de Vos nde...@redhat.com wrote:

 On Mon, Feb 09, 2015 at 05:34:04PM -0500, Jeff Darcy wrote:
  The inaugural GlusterFS 4.0 meeting on Friday was a great success.
  Thanks to all who attended.  Minutes are here:
 
 
 http://meetbot.fedoraproject.org/gluster-meeting/2015-02-06/glusterfs_4.0.2015-02-06-12.05.html
 
  One action item was to figure out when IRC meetings (#gluster-meeting on
  Freenode) should occur.  Meetings are likely to be every other week,
  starting February 26 or 27.  Obviously, it would be good to pick a time
  that gets us maximum participation.  If you would be interested in
  attending, please let me know:
 
  * Which day - Thursday or Friday
 
  * What time - 11:00, 12:00, or 13:00 UTC

 Thursday at 13:00 would work, other times not so much.
 Any of the proposed times on Friday are OK.

 Thanks,
 Niels

 
  Please try to keep in mind that our two main constituencies are in IST
  (+0530) and EST (-0500).  With at least one developer in Europe and none
  so far on the US west coast, this is as close to a sweet spot as we
  can get.  If we do pick up developers in time zones where this isn't
  sufficient, we could rotate or I would be willing to run two meetings
  per cycle on opposite schedules; let's cross that bridge when we get
  there.
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:
My base filesystem has 40-TB and the tar takes 19 minutes. I copied 
over 10-TB and it took the tar extraction from 1-minute to 7-minutes.


My suspicion is that it is related to number of files and not 
necessarily file size. Shyam is looking into reproducing this 
behavior on a redhat system.
I am able to reproduce the issue on a similar setup internally (at 
least at the surface it seems to be similar to what David is facing).


I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for 
providing me with the setup shyam, keep the setup we may need it for 
further analysis)


On bad volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 937104 FORGET
  0.00   0.00 us   0.00 us   0.00 us 872478 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 23668 RELEASEDIR
  0.00  41.86 us  23.00 us  86.00 us 92 STAT
  0.01  39.40 us  24.00 us 104.00 us 218 STATFS
  0.28  55.99 us  43.00 us1152.00 us 4065 SETXATTR
  0.58  56.89 us  25.00 us4505.00 us 8236 OPENDIR
  0.73  26.80 us  11.00 us 257.00 us 22238 FLUSH
  0.77 152.83 us  92.00 us8819.00 us 4065 RMDIR
  2.57  62.00 us  21.00 us 409.00 us 33643 WRITE
  5.46 199.16 us 108.00 us  469938.00 us 22238 UNLINK
  6.70  69.83 us  43.00 us.00 us 77809 LOOKUP
  6.97 447.60 us  21.00 us   54875.00 us 12631 READDIRP
  7.73  79.42 us  33.00 us1535.00 us 78909 SETATTR
 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR
 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE

On good volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 58870 FORGET
  0.00   0.00 us   0.00 us   0.00 us 66016 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 16480 RELEASEDIR
  0.00  61.50 us  58.00 us  65.00 us 2OPEN
  0.01  39.56 us  16.00 us 112.00 us 71 STAT
  0.02  41.29 us  27.00 us  79.00 us 163 STATFS
  0.03  36.06 us  17.00 us  98.00 us 301 FSTAT
  0.79  62.38 us  39.00 us 269.00 us 4065 SETXATTR
  1.14 242.99 us  25.00 us   28636.00 us 1497 READ
  1.54  59.76 us  25.00 us6325.00 us 8236 OPENDIR
  1.70 133.75 us  89.00 us 374.00 us 4065 RMDIR
  2.25  32.65 us  15.00 us 265.00 us 22006 FLUSH
  3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR
  7.14  68.34 us  21.00 us   21902.00 us 33357 WRITE
 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK
 13.82 200.54 us 133.00 us   21762.00 us 22003 CREATE
 17.85 448.85 us  22.00 us   54046.00 us 12697 READDIRP
 18.37  76.12 us  45.00 us 294.00 us 77044 LOOKUP
 20.95  85.54 us  35.00 us1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from full 
volume compared to the brick from empty volume. It seems to suggest 
that the inode-table on the volume with lots of data is carrying too 
many passive inodes in the table which need to be displaced to create 
new ones. Need to check if they come in the fop-path. Need to continue 
my investigations further, will let you know.
Just to increase confidence performed one more test. Stopped the volumes 
and re-started. Now on both the volumes, the numbers are almost same:


[root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf 
boost_1_57_0.tar.gz


real1m15.074s
user0m0.550s
sys 0m4.656s

real2m46.866s
user0m5.347s
sys 0m16.047s

[root@gqac031 gluster-mount]# cd /gluster-emptyvol/
[root@gqac031 gluster-emptyvol]# ls
boost_1_57_0.tar.gz
[root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz

real2m31.467s
user0m5.475s
sys 0m15.471s

gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)


Pranith


Pranith

Thanks Shyam. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: 

Re: [Gluster-devel] missing files

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:

My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB 
and it took the tar extraction from 1-minute to 7-minutes.

My suspicion is that it is related to number of files and not necessarily file 
size. Shyam is looking into reproducing this behavior on a redhat system.

I am able to reproduce the issue on a similar setup internally (at least at the 
surface it seems to be similar to what David is facing).

I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for providing 
me with the setup shyam, keep the setup we may need it for further analysis)


On bad volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 937104  FORGET
  0.00   0.00 us   0.00 us   0.00 us 872478 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  23668 
RELEASEDIR

  0.00  41.86 us  23.00 us  86.00 us 92STAT
  0.01  39.40 us  24.00 us 104.00 us 218  STATFS
  0.28  55.99 us  43.00 us1152.00 us 4065SETXATTR
  0.58  56.89 us  25.00 us4505.00 us 8236 OPENDIR
  0.73  26.80 us  11.00 us 257.00 us 22238   FLUSH
  0.77 152.83 us  92.00 us8819.00 us 4065   RMDIR
  2.57  62.00 us  21.00 us 409.00 us 33643   WRITE
  5.46 199.16 us 108.00 us  469938.00 us 22238  UNLINK
  6.70  69.83 us  43.00 us.00 us 77809  LOOKUP
  6.97 447.60 us  21.00 us   54875.00 us 12631READDIRP
  7.73  79.42 us  33.00 us1535.00 us 78909 SETATTR
 14.112815.00 us 176.00 us 2106305.00 us 4065   MKDIR
 54.091972.62 us 138.00 us 1520773.00 us 22238  CREATE

On good volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 58870  FORGET
  0.00   0.00 us   0.00 us   0.00 us 66016 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  16480 
RELEASEDIR

  0.00  61.50 us  58.00 us  65.00 us 2OPEN
  0.01  39.56 us  16.00 us 112.00 us 71STAT
  0.02  41.29 us  27.00 us  79.00 us 163  STATFS
  0.03  36.06 us  17.00 us  98.00 us 301   FSTAT
  0.79  62.38 us  39.00 us 269.00 us 4065SETXATTR
  1.14 242.99 us  25.00 us   28636.00 us 1497READ
  1.54  59.76 us  25.00 us6325.00 us 8236 OPENDIR
  1.70 133.75 us  89.00 us 374.00 us 4065   RMDIR
  2.25  32.65 us  15.00 us 265.00 us 22006   FLUSH
  3.37 265.05 us 172.00 us2349.00 us 4065   MKDIR
  7.14  68.34 us  21.00 us   21902.00 us 33357   WRITE
 11.00 159.68 us 107.00 us2567.00 us 22003  UNLINK
 13.82 200.54 us 133.00 us   21762.00 us 22003  CREATE
 17.85 448.85 us  22.00 us   54046.00 us 12697READDIRP
 18.37  76.12 us  45.00 us 294.00 us 77044  LOOKUP
 20.95  85.54 us  35.00 us1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from full 
volume compared to the brick from empty volume. It seems to suggest that 
the inode-table on the volume with lots of data is carrying too many 
passive inodes in the table which need to be displaced to create new 
ones. Need to check if they come in the fop-path. Need to continue my 
investigations further, will let you know.


Pranith

Thanks Shyam. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/13/2015 12:07 AM, Niels de Vos wrote:

On Thu, Feb 12, 2015 at 11:39:51PM +0530, Pranith Kumar Karampuri wrote:

On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:

On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for
answers (I've had more than 1400 requests ongoing). Probably there will
be many factors that can influence on the load that this causes, and one
of them could be ec, but it's not related exclusively to ec. I've
repeated the test using a replica 3 and a replica 2 volumes and the
problem still happens.

The test basically writes a file to an NFS mount using 'dd'. The file
has a size of 1GB. With a smaller file, the test passes successfully.

Using NFS client and gluster NFS server on same machine with BIG file dd
operations is known to cause hangs. anon-fd-quota.t used to give similar
problems so we changed the test to not involve NFS mounts.

I don't re-collect the exact scenario. Avati found the deadlock of memory
allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised
this bug then. CCed him to the thread as well if he knows the exact
scenario.

This is a well know issue. When a system is under memory pressure, it
will try to flush dirty pages from the VFS. The NFS-client will send the
dirty pages over the network to the NFS-server. Unfortunately, the
NFS-server needs to allocate memory for the handling or the WRITE
procedures. This causes a loop and will most often get the system into a
hang situation.
Yes. This was it :-). Seems like Xavi and Shyam found the reason for the 
failure though, which is not this.


Pranith


Mounting with -o sync, or flushing outstanding I/O from the client
side should normally be sufficient to prevent these issues.

Nice, didn't know about this.

Pranith


Niels


Pranith

Pranith

One important thing to note is that I'm not using powerful servers (a
dual core Intel Atom), but this problem shouldn't happen anyway. It can
even happen on more powerful servers if they are busy doing other things
(maybe this is what's happening on jenkins' slaves).

I think that this causes some NFS requests to timeout. This can be seen
in /var/log/messages (there are many of these messages):

Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not
responding, timed out

nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc
-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply]
0-nfs-nfsv3: Reply submission failed

Additionally this causes disconnections from NFS that are not correctly
handled causing that a thread gets stuck in an infinite loop (I haven't
analyzed this problem deeply, but it seems like an attempt to use an
already disconnected socket). After a while, I get this error on the nfs
log:

[2015-02-12 14:20:19.545429] C
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0:
server 192.168.200.61:49152 has not responded in the last 42 seconds,
disconnecting.

The console executing the test shows this (nfs.t is creating a replica 3
instead of a dispersed volume):

# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing
‘/mnt/nfs/0/test’: Input/output error
[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
Failed test: 9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr
3.45 csys = 4.89 CPU)
Result: FAIL
Failed tests ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without
having an answer ?

Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.

This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would have run with a single epoll
thread
(the MT epoll patch had this hard coded, so 

[Gluster-devel] coredumps in regression runs

2015-02-12 Thread Vijay Bellur
Several coredumps are being generated in regression runs now [1]. Anyone 
had a chance to look into this?


-Vijay

[1] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/4233/consoleFull

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread David F. Robinson

Shyam,

You asked me to stop/start the slow volume to see if it fixed the timing 
issue.  I stopped/started homegfs_backup (the production volume with 40+ 
TB) and it didn't make it faster.  I didn't stop/start the fast volume 
to see if it made it slower.  I just did  that and sent out an email.  I 
saw a similar result as Pranith.


however, I tried this test below and saw no issues.  So, i don't know 
why restart the older volume of test3brick slowed it down but the test 
below shows no slowdown.



#... Create 2-new bricks
gluster volume create test4brick 
gfsib01bkp.corvidtec.com:/data/brick01bkp/test4brick 
gfsib01bkp.corvidtec.com:/data/brick02bkp/test4brick
gluster volume create test5brick 
gfsib01bkp.corvidtec.com:/data/brick01bkp/test5brick 
gfsib01bkp.corvidtec.com:/data/brick02bkp/test5brick

gluster volume start test4brick
gluster volume start test5brick

mount /test4brick
mount /test5brick

cp /root/boost_1_57_0.tar /test4brick
cp /root/boost_1_57_0.tar /test5brick

#... Stop/start test4brick to see if this causes a timing issue
umount /test4brick
gluster volume stop test4brick
gluster volume start test4brick
mount /test4brick


#... Run test on both new bricks
cd /test4brick
time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0

real1m29.712s
user0m0.415s
sys 0m2.772s

real0m18.866s
user0m0.087s
sys 0m0.556s

cd /test5brick
time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0

real 1m28.243s
user 0m0.366s
sys 0m2.502s

real 0m18.193s
user 0m0.075s
sys 0m0.543s

#... Repeat again after stop/start of test4brick
umount /test4brick
gluster volume stop test4brick
gluster volume start test4brick
mount /test4brick
cd /test4brick
time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0

real1m25.277s
user0m0.466s
sys 0m3.107s

real0m16.575s
user0m0.084s
sys 0m0.577s

-- Original Message --
From: Shyam srang...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift 
jus...@gluster.org
Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson 
david.robin...@corvidtec.com

Sent: 2/12/2015 10:46:14 AM
Subject: Re: [Gluster-devel] missing files


On 02/12/2015 06:22 AM, Pranith Kumar Karampuri wrote:


On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:
Just to increase confidence performed one more test. Stopped the 
volumes

and re-started. Now on both the volumes, the numbers are almost same:

[root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf
boost_1_57_0.tar.gz

real 1m15.074s
user 0m0.550s
sys 0m4.656s

real 2m46.866s
user 0m5.347s
sys 0m16.047s

[root@gqac031 gluster-mount]# cd /gluster-emptyvol/
[root@gqac031 gluster-emptyvol]# ls
boost_1_57_0.tar.gz
[root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz

real 2m31.467s
user 0m5.475s
sys 0m15.471s

gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)


If I remember right, we performed a similar test on David's setup, but 
I believe there was no significant performance gain there. David could 
you clarify?


Just so we know where we are headed :)

Shyam


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread Shyam

On 02/12/2015 11:18 AM, David F. Robinson wrote:

Shyam,

You asked me to stop/start the slow volume to see if it fixed the timing
issue.  I stopped/started homegfs_backup (the production volume with 40+
TB) and it didn't make it faster.  I didn't stop/start the fast volume
to see if it made it slower.  I just did  that and sent out an email.  I
saw a similar result as Pranith.


Just to be clear even after restart of the slow volume, we see ~19 
minutes for the tar to complete, correct?


Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, 
irrespective of start, fresh create, etc. correct?


Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread Justin Clift
On 12 Feb 2015, at 11:22, Pranith Kumar Karampuri pkara...@redhat.com wrote:
snip
 Just to increase confidence performed one more test. Stopped the volumes and 
 re-started. Now on both the volumes, the numbers are almost same:

Oh.  So it's a problem that turns up after a certain amount of
activity has happened on a volume?

eg a lot of intensive activity would show up quickly, but a
   less intense amount of activity would take longer to show
   the effect

Kaleb's long running cluster might be useful to catch this
kind of thing in future, depending on the workload running on
it, and the kind of pre/post tests we run. (eg to catch performance
regressions)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread David F. Robinson
That is very interesting.  I tried this test and received a similar 
result.  Start/stopping the volume causes a timing issue on the blank 
volume.  It seems like there is some parameter getting set when you 
create a volume and gets reset when you start/stop a volume.  Or, 
something gets set during the start/stop operation that causes the 
problem.  Is there a way to list all parameters that are set for a 
volume?  gluster volume info only shows the ones that the user has 
changed from defaults.


[root@gfs01bkp ~]# gluster volume stop test3brick
Stopping volume will make its data inaccessible. Do you want to 
continue? (y/n) y

volume stop: test3brick: success
[root@gfs01bkp ~]# gluster volume start test3brick
volume start: test3brick: success
[root@gfs01bkp ~]# mount /test3brick
[root@gfs01bkp ~]# cd /test3brick/
[root@gfs01bkp test3brick]# date; time tar -xPf boost_1_57_0.tar ; time 
rm -rf boost_1_57_0

Thu Feb 12 10:42:43 EST 2015

real3m46.002s
user0m0.421s
sys 0m2.812s

real0m15.406s
user0m0.092s
sys 0m0.549s


-- Original Message --
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Justin Clift jus...@gluster.org; Shyam srang...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson 
david.robin...@corvidtec.com

Sent: 2/12/2015 6:22:23 AM
Subject: Re: [Gluster-devel] missing files



On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:
My base filesystem has 40-TB and the tar takes 19 minutes. I copied 
over 10-TB and it took the tar extraction from 1-minute to 
7-minutes.


My suspicion is that it is related to number of files and not 
necessarily file size. Shyam is looking into reproducing this 
behavior on a redhat system.
I am able to reproduce the issue on a similar setup internally (at 
least at the surface it seems to be similar to what David is 
facing).


I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for 
providing me with the setup shyam, keep the setup we may need it for 
further analysis)


On bad volume:
 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
 - --- --- ---  
  0.00 0.00 us 0.00 us 0.00 us 937104 FORGET
  0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE
  0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR
  0.00 41.86 us 23.00 us 86.00 us 92 STAT
  0.01 39.40 us 24.00 us 104.00 us 218 STATFS
  0.28 55.99 us 43.00 us 1152.00 us 4065 SETXATTR
  0.58 56.89 us 25.00 us 4505.00 us 8236 OPENDIR
  0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH
  0.77 152.83 us 92.00 us 8819.00 us 4065 RMDIR
  2.57 62.00 us 21.00 us 409.00 us 33643 WRITE
  5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK
  6.70 69.83 us 43.00 us .00 us 77809 LOOKUP
  6.97 447.60 us 21.00 us 54875.00 us 12631 READDIRP
  7.73 79.42 us 33.00 us 1535.00 us 78909 SETATTR
 14.11 2815.00 us 176.00 us 2106305.00 us 4065 MKDIR
 54.09 1972.62 us 138.00 us 1520773.00 us 22238 CREATE

On good volume:
 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
 - --- --- ---  
  0.00 0.00 us 0.00 us 0.00 us 58870 FORGET
  0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE
  0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR
  0.00 61.50 us 58.00 us 65.00 us 2 OPEN
  0.01 39.56 us 16.00 us 112.00 us 71 STAT
  0.02 41.29 us 27.00 us 79.00 us 163 STATFS
  0.03 36.06 us 17.00 us 98.00 us 301 FSTAT
  0.79 62.38 us 39.00 us 269.00 us 4065 SETXATTR
  1.14 242.99 us 25.00 us 28636.00 us 1497 READ
  1.54 59.76 us 25.00 us 6325.00 us 8236 OPENDIR
  1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR
  2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH
  3.37 265.05 us 172.00 us 2349.00 us 4065 MKDIR
  7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE
 11.00 159.68 us 107.00 us 2567.00 us 22003 UNLINK
 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE
 17.85 448.85 us 22.00 us 54046.00 us 12697 READDIRP
 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP
 20.95 85.54 us 35.00 us 1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from full 
volume compared to the brick from empty volume. It seems to suggest 
that the inode-table on the volume with lots of data is carrying too 
many passive inodes in the table which need to be displaced to create 
new ones. Need to check if they come in the fop-path. Need to continue 
my investigations further, will let you know.
Just to increase confidence performed one more test. Stopped the 
volumes and re-started. Now on both the volumes, the numbers are almost 
same:


[root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf 
boost_1_57_0.tar.gz


real 1m15.074s

Re: [Gluster-devel] missing files

2015-02-12 Thread David F. Robinson



-- Original Message --
From: Shyam srang...@redhat.com
To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar 
Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org

Cc: Gluster Devel gluster-devel@gluster.org
Sent: 2/12/2015 11:26:51 AM
Subject: Re: [Gluster-devel] missing files


On 02/12/2015 11:18 AM, David F. Robinson wrote:

Shyam,

You asked me to stop/start the slow volume to see if it fixed the 
timing
issue. I stopped/started homegfs_backup (the production volume with 
40+

TB) and it didn't make it faster. I didn't stop/start the fast volume
to see if it made it slower. I just did that and sent out an email. I
saw a similar result as Pranith.


Just to be clear even after restart of the slow volume, we see ~19 
minutes for the tar to complete, correct?

Correct



Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, 
irrespective of start, fresh create, etc. correct?

Correct



Shyam


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread David F. Robinson
FWIW, starting/stopping a volume that is fast doesn't consistently make 
it slow.  I just tried it again on an older volume... It doesn't make it 
slow.  I also went back and re-ran the test on test3brick and it isn't 
slow any longer.  Maybe there is a time lag after stopping/starting a 
volume before it becomes fast.


Either way, stopping/starting a fast volume only makes it slow for 
some period of time and it doesn't consistently make it slow.  I don't 
think this is the issue.  red-herring.


[root@gfs01bkp /]# gluster volume stop test2brick
Stopping volume will make its data inaccessible. Do you want to 
continue? (y/n) y

[root@gfs01bkp /]# gluster volume start test2brick
volume start: test2brick: success
[root@gfs01bkp /]# mount /test2brick
[root@gfs01bkp /]# cd /test2brick
[root@gfs01bkp test2brick]# time tar -xPf boost_1_57_0.tar; time rm -rf 
boost_1_57_0


real1m1.124s
user0m0.432s
sys 0m3.136s

real0m16.630s
user0m0.083s
sys 0m0.570s


#... Retest on test3brick after it has been up after a volume restart 
for 20-minutes... Compare this to running the test immediately after a 
restart which gave a time of 3.5-minutes.
[root@gfs01bkp test3brick]#  time tar -xPf boost_1_57_0.tar; time rm -rf 
boost_1_57_0


real1m17.786s
user0m0.502s
sys 0m3.278s

real0m18.103s
user0m0.101s
sys 0m0.684s



-- Original Message --
From: Shyam srang...@redhat.com
To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar 
Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org

Cc: Gluster Devel gluster-devel@gluster.org
Sent: 2/12/2015 11:26:51 AM
Subject: Re: [Gluster-devel] missing files


On 02/12/2015 11:18 AM, David F. Robinson wrote:

Shyam,

You asked me to stop/start the slow volume to see if it fixed the 
timing
issue. I stopped/started homegfs_backup (the production volume with 
40+

TB) and it didn't make it faster. I didn't stop/start the fast volume
to see if it made it slower. I just did that and sent out an email. I
saw a similar result as Pranith.


Just to be clear even after restart of the slow volume, we see ~19 
minutes for the tar to complete, correct?


Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, 
irrespective of start, fresh create, etc. correct?


Shyam


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel