Re: [Gluster-devel] tests/bugs/quota/bug-1035576.t & tests/basic/quota-nfs.t spurious failures?

2015-09-01 Thread Vijaikumar M
Below patch submitted upstream. This fixes the testcase 
'./tests/basic/quota-nfs.t'


http://review.gluster.org/#/c/12075/

Thanks,
Vijay



On Tuesday 01 September 2015 11:38 AM, Vijaikumar M wrote:

We will look into this issue.

Thanks,
Vijay

On Tuesday 01 September 2015 11:03 AM, Atin Mukherjee wrote:

One more instance -
https://build.gluster.org/job/rackspace-regression-2GB-triggered/13899/consoleFull 



Can you please put these tests in bad_tests()

On 08/31/2015 09:23 AM, Atin Mukherjee wrote:

For tests/bugs/quota/bug-1035576.t refer [1]
For tests/basic/quota-nfs.t refer [2]

Please note I've not added these tests in the spurious failures list 
[3]

yet.

[1]
https://build.gluster.org/job/rackspace-regression-2GB-triggered/13829/consoleFull 


[2]
https://build.gluster.org/job/rackspace-regression-2GB-triggered/13839/consoleFull 


[3] https://public.pad.fsfe.org/p/gluster-spurious-failures

Thanks,
Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd build failure

2015-08-20 Thread Vijaikumar M



On Friday 21 August 2015 10:21 AM, Avra Sengupta wrote:

+ Adding Vijaikumar

On 08/20/2015 04:19 PM, Niels de Vos wrote:

On Thu, Aug 20, 2015 at 03:05:56AM -0400, Susant Palai wrote:

Hi,
   I tried running netbsd regression twice on a patch. And twice it failed at 
the same point. Here is the error:

snip
Build GlusterFS
***

+ '/opt/qa/build.sh'
   File /usr/pkg/lib/python2.7/site.py, line 601
 [2015-08-19 05:45:06.N]:++ 
G_LOG:./tests/basic/quota-anon-fd-nfs.t: TEST: 85 ! fd_write 3 content 
++
This particular test is currently in bad test and I believe Vijaikumar 
is looking into it. Could you please make sure if there is any other 
failure(apart from this), which is failing the regression runs.

^
SyntaxError: invalid token
We have marked test './tests/basic/quota-anon-fd-nfs.t' as bad-test, I 
am not sure about 'SyntaxError' error. I think there is some parsing 
error in the shell script, need to root cause the issue.




+ RET=1
+ '[' 1 '!=' 0 ']'
+ exit 1
Build step 'Ex?cuter un script shell' marked build as failure
Finished: FAILURE
/snip

  Requesting you to take look into it.

Which Jenkins slave was this? Got a link to the job that failed?

This looks again like a NetBSD slave where logs from regression tests
are overwriting random files. The /usr/pkg/lib/python2.7/site.py file
should be valid Python, and not contain these logs...

Anyone has any ideas why this happens?

Thanks,
Niels


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression failures

2015-08-17 Thread Vijaikumar M



On Monday 17 August 2015 12:22 PM, Avra Sengupta wrote:

Hi,

The NetBSD regression tests are continuously failing with errors in 
the following tests:


./tests/basic/mount-nfs-auth.t
./tests/basic/quota-anon-fd-nfs.t
quota-anon-fd-nfs.t is known issues with NFS client caching so it is 
marked as bad test, final test will be marked as success even if this 
test fails.






Is there any recent change that is trigerring this behaviour. Also 
currently one machine is running NetBSD tests. Can someone with access 
to Jenkins, bring up a few more slaves to run NetBSD regressions in 
parallel.


Regards,
Avra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.7 spurious failures

2015-07-13 Thread Vijaikumar M



On Monday 13 July 2015 11:14 PM, Joseph Fernandes wrote:

Hi All,

These are some of the recent hit spurious failures on 3.7 branch

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12356/consoleFull
./tests/bugs/snapshot/bug-1109889.t
is blocking
http://review.gluster.org/11649 merge


http://build.gluster.org/job/rackspace-regression-2GB-triggered/12357/consoleFull
./tests/bugs/fuse/bug-1126048.t
is blocking
http://review.gluster.org/11608 merge


Net bsd:

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8257/consoleFull
./tests/basic/quota-nfs.t
is blocking
http://review.gluster.org/11649 merge.
This issue is fixed in master, we have submitted back-port patch to 3.7. 
It will be merged by today eod


Thanks,
Vijay



Appropriate owners please take a look.

Thanks  Regards,
Joe
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failure in 3.7.2: ./tests/bugs/quota/afr-quota-xattr-mdata-heal.t

2015-07-10 Thread Vijaikumar M
Patch submitted upstream which fixes this issue: 
http://review.gluster.org/#/c/11583/

Will submit the fix for 3.7 as well.

Thanks,
Vijay


On Friday 10 July 2015 01:19 PM, Joseph Fernandes wrote:

http://build.gluster.org/job/rackspace-regression-2GB-triggered/12204/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression tests not Initializing...

2015-07-10 Thread Vijaikumar M

NetBSD tests arefailing again:

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8123/console

Triggered by Gerrit:http://review.gluster.org/11616  in silent mode.
Building remotely onnbslave74.cloud.gluster.org  
http://build.gluster.org/computer/nbslave74.cloud.gluster.org  
(netbsd7_regression) in workspace 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered
  git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
  git config remote.origin.urlhttp://review.gluster.org/glusterfs.git  # 
timeout=10
Fetching upstream changes fromhttp://review.gluster.org/glusterfs.git
  git --version # timeout=10
  git -c core.askpass=true fetch --tags 
--progresshttp://review.gluster.org/glusterfs.git  refs/changes/16/11616/1
ERROR: Error fetching remote repo 'origin'
ERROR  http://stacktrace.jenkins-ci.org/search?query=ERROR: Error fetching 
remote repo 'origin'
Finished  http://stacktrace.jenkins-ci.org/search?query=Finished: FAILURE

Thanks,
Vijay




On Tuesday 07 July 2015 07:13 PM, Kaushal M wrote:

I've taken this slave and one other offline and am rebooting it.

On Tue, Jul 7, 2015 at 6:44 PM, Kotresh Hiremath Ravishankar
khire...@redhat.com wrote:

Hi Emmanuel,

We are seeing these issues again on nbslave7h.cloud.gluster.org
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7974/console

Thanks and Regards,
Kotresh H R

- Original Message -

From: Emmanuel Dreyfus m...@netbsd.org
To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Sunday, July 5, 2015 12:52:23 AM
Subject: Re: [Gluster-devel] NetBSD regression tests not Initializing...

Kotresh Hiremath Ravishankar khire...@redhat.com wrote:


Any help is appreciated.

nbslave72 was sick indeed: it refused SSH connexions. I rebooted it and
retiggered your change, but it went on another machine.

--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:

I've been hitting spurious failures in Linux regression runs for my change [1].

The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures again

2015-07-08 Thread Vijaikumar M



On Wednesday 08 July 2015 03:53 PM, Vijaikumar M wrote:



On Wednesday 08 July 2015 03:42 PM, Kaushal M wrote:
I've been hitting spurious failures in Linux regression runs for my 
change [1].


The following tests failed,
./tests/basic/afr/replace-brick-self-heal.t [2]
./tests/bugs/replicate/bug-1238508-self-heal.t [3]
./tests/bugs/quota/afr-quota-xattr-mdata-heal.t [4]

I will look into this issue

Patch submitted: http://review.gluster.org/#/c/11583/




./tests/bugs/quota/bug-1235182.t [5]

I have submitted two patches to fix failures from 'bug-1235182.t'
http://review.gluster.org/#/c/11561/
http://review.gluster.org/#/c/11510/


./tests/bugs/replicate/bug-977797.t [6]

Can AFR and quota owners look into this?

Thanks.

Kaushal

[1] https://review.gluster.org/11559
[2] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12023/consoleFull
[3] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12029/consoleFull
[4] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12044/consoleFull
[5] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12060/consoleFull
[6] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12071/consoleFull




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Huge memory consumption with quota-marker

2015-07-02 Thread Vijaikumar M



On Thursday 02 July 2015 11:27 AM, Krishnan Parthasarathi wrote:

Yes. The PROC_MAX is the maximum no. of 'worker' threads that would be spawned 
for a given
syncenv.

- Original Message -


- Original Message -

From: Krishnan Parthasarathi kpart...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Vijay Bellur vbel...@redhat.com, Vijaikumar M
vmall...@redhat.com, Gluster Devel
gluster-devel@gluster.org, Raghavendra Gowdappa rgowd...@redhat.com,
Nagaprasad Sathyanarayana
nsath...@redhat.com
Sent: Thursday, July 2, 2015 10:54:44 AM
Subject: Re: Huge memory consumption with quota-marker

Yes, we could take synctask size as an argument for synctask_create.
The increase in synctask threads is not really a problem, it can't
grow more than 16 (SYNCENV_PROC_MAX).

That is it cannot grow more than PROC_MAX in _single_ syncenv I suppose.


- Original Message -


On 07/02/2015 10:40 AM, Krishnan Parthasarathi wrote:

- Original Message -

On Wednesday 01 July 2015 08:41 AM, Vijaikumar M wrote:

Hi,

The new marker xlator uses syncop framework to update quota-size in
the
background, it uses one synctask per write FOP.
If there are 100 parallel writes with all different inodes but on the
same directory '/dir', there will be ~100 txn waiting in queue to
acquire a lock on on its parent i.e '/dir'.
Each of this txn uses a syntack and each synctask allocates stack
size
of 2M (default size), so total 0f 200M usage. This usage can increase
depending on the load.

I am think of of using the stacksize for synctask to 256k, will this
mem
be sufficient as we perform very limited operations within a synctask
in
marker updation?


Seems like a good idea to me. Do we need a 256k stacksize or can we
live
with something even smaller?

It was 16K when synctask was introduced. This is a property of syncenv.
We
could
create a separate syncenv for marker transactions which has smaller
stacks.
env-stacksize (and SYNCTASK_DEFAULT_STACKSIZE) was increased to 2MB to
support
pump xlator based data migration for replace-brick. For the no. of
stack
frames
a marker transaction could use at any given time, we could use much
lesser,
16K say.
Does that make sense?


What are the information are we store in this memory? Is it only the 
frames, are we also storing the function's stack data?


Thanks,
Vijay

Creating one more syncenv will lead to extra sync-threads, may be we can
take stacksize as argument.

Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Vijaikumar M

We look into this issue

Thanks,
Vijay

On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote:

Hi,

I see quota.t regression failure for the following. The changes are related to
example programs in libgfchangelog.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull

Could someone from quota team, take a look at it.

Thanks and Regards,
Kotresh H R



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Huge memory consumption with quota-marker

2015-07-01 Thread Vijaikumar M

Hi,

The new marker xlator uses syncop framework to update quota-size in the 
background, it uses one synctask per write FOP.
If there are 100 parallel writes with all different inodes but on the 
same directory '/dir', there will be ~100 txn waiting in queue to 
acquire a lock on on its parent i.e '/dir'.
Each of this txn uses a syntack and each synctask allocates stack size 
of 2M (default size), so total 0f 200M usage. This usage can increase 
depending on the load.


I am think of of using the stacksize for synctask to 256k, will this mem 
be sufficient as we perform very limited operations within a synctask in 
marker updation?


Please provide suggestions on solving this problem?


Thanks,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Three Issues Confused me recently

2015-06-26 Thread Vijaikumar M



On Friday 26 June 2015 12:59 PM, Susant Palai wrote:

Comment inline.

- Original Message -

From: christ1...@sina.com
To: gluster-devel gluster-devel@gluster.org
Sent: Thursday, 25 June, 2015 7:56:45 PM
Subject: [Gluster-devel] Three Issues Confused me recently



Hi, everyone!




Nowadays, t here are three issues confusing me recently when I used the
glusterfs to save huge datas. Like below:
1) Is there any reason for reserving 10% free space of each brick in the
volume ? And Can I do not reserve the 10% free space of each brick in the
volume? You know, I will use the glusterfs to save huge surveillance videos,
so each brick will be set a large disk space. If each brick will be reserved
10% free space, it must led to low usage of disk and waster many disk
spaces.


10% is the default and it can be modified by the cluster.min-free-disk option.

e.g gluster v set _VOL_NAME_ min-free-disk 8GB


  *On the question of what should be this cluster.min-free-disk's value?*
   
   Cluster.min-free-disk: The min-free-disk setting establishes a data threshold for each brick in a volume. The primary intention of this is to ensure that there is adequate space to perform self-heal and rebalance operations, both of which require disk overhead. The min-free-disk value is taken into account when it is already exceeded before a file is being written. When that is the case, the DHT algorithm will choose to write the file instead to another brick where min-free-disk is not exceeded, and will write a 0-byte link-to file on the brick where min-free-disk is exceeded and where the file was originally hashed. This link-to file contains metadata to point the client to the brick where the data was actually written.  Because min-free-disk is only considered after it has been exceeded, and because the DHT algorithm makes no other consideration of available space on a brick, it is possible to write a large file that will exceed the space on the brick it is hashed to e

ven while another brick has enough space to hold the file. This would result in 
an I/O error to the client.


So if you know you routinely write files up to  nGB size, then 
min-free-disk can be set to arbitrarily a little larger value than n. For 
example if your file size is 5GB which is at the high end of the file sizes you 
will be writing, then
you might consider setting min-free-disk to be 8GB. Doing this will ensure that 
the file will go to a brick with enough available space (assuming one exist).



2) Will it appear some exceptions when the filesystem, like xfs, ext4, had
been written fully?


As I already mentioned above, the new file creation will be redirected to a 
different brick with adequate space considering min-free-disk is exceeded.


3) Is it natural that a very high cpu usage when the directory quota is
enabled ? (glusterfs 3.6.2)


What is the testcase which causes high cpu usage?




CCing quota team for this.


And is there any solution to avoid it ?


I am very appreciate for your help, thanks very much.







Best regards.







Louis

2015/6/25

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure with test-case ./tests/basic/tier/tier.t

2015-06-26 Thread Vijaikumar M

Hi

Upstream regression failure with test-case ./tests/basic/tier/tier.t

My patch# 11315 regression failed twice with 
test-case./tests/basic/tier/tier.t. Anyone seeing this issue with other 
patches?


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11396/consoleFull
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11456/consoleFull


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is consistently failing

2015-06-25 Thread Vijaikumar M

Hi Niels,

Patch# 11022 is not available downstream 3.1.
Patch# 11361 is a blocker for 3.1 and depends on ref-count functions, is 
it possible to backport patch #11022 to downstream?


Thanks,
Vijay


On Tuesday 23 June 2015 06:26 PM, Niels de Vos wrote:

On Tue, Jun 23, 2015 at 05:30:39PM +0530, Vijaikumar M wrote:


On Tuesday 23 June 2015 04:28 PM, Niels de Vos wrote:

On Tue, Jun 23, 2015 at 03:45:43PM +0530, Vijaikumar M wrote:

I have submitted below patch which fixes this issue. I am handling memory
clean-up with reference countmechanism.

http://review.gluster.org/#/c/11361

Is there a reason you can not use the (new) refcounting functions that
were introduceed with http://review.gluster.org/11022 ?

I was not aware that ref-counting patch was merged. Sure we will use these
function and re-submit my patch.

Ok, thanks!
Niels


Thanks,
Vijay



It would be nicer to standardize all refcounting mechanisms on one
implementation. I hope we can replace existing refcounting with this one
too. Introducing more refcounting ways is not going to be helpful.

Thanks,
Niels


Thanks,
Vijay




On Tuesday 23 June 2015 12:58 PM, Raghavendra G wrote:

Multiple replies to same query. Pick one ;).

On Tue, Jun 23, 2015 at 12:55 PM, Venky Shankar yknev.shan...@gmail.com
mailto:yknev.shan...@gmail.com wrote:

OK. Two reverts of the same patch ;)

Pick one.

On Tue, Jun 23, 2015 at 12:51 PM, Raghavendra Gowdappa
rgowd...@redhat.com mailto:rgowd...@redhat.com wrote:
 Seems like its a memory corruption caused by:
 http://review.gluster.org/11311

 I've reverted the patch at:
 http://review.gluster.org/11360

 - Original Message -
 From: Xavier Hernandez xhernan...@datalab.es
mailto:xhernan...@datalab.es
 To: Gluster Devel gluster-devel@gluster.org
mailto:gluster-devel@gluster.org
 Sent: Tuesday, June 23, 2015 12:44:47 PM
 Subject: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is
consistently failing

 Hi,

 the quota test bug-1153964.t is failing consistently for a totally
 unrelated patch. Is this a known issue ?



http://build.gluster.org/job/rackspace-regression-2GB-triggered/11142/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11165/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11172/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11191/consoleFull

 Xavi
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is consistently failing

2015-06-24 Thread Vijaikumar M

Hi All,

I request you to re-base your patch which are failed regression with 
test-case bug-1153964.t


Thanks,
Vijay


On Wednesday 24 June 2015 11:42 AM, Raghavendra Gowdappa wrote:

http://review.gluster.org/#/c/11362/ has been merged.

- Original Message -

From: Atin Mukherjee amukh...@redhat.com
To: Raghavendra Gowdappa rgowd...@redhat.com
Cc: Niels de Vos nde...@redhat.com, Vijaikumar M vmall...@redhat.com, 
Raghavendra G
raghaven...@gluster.com, Gluster Devel gluster-devel@gluster.org
Sent: Wednesday, June 24, 2015 10:55:02 AM
Subject: Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is consistently 
failing



On 06/24/2015 10:53 AM, Raghavendra Gowdappa wrote:


- Original Message -

From: Atin Mukherjee amukh...@redhat.com
To: Niels de Vos nde...@redhat.com, Vijaikumar M
vmall...@redhat.com
Cc: Raghavendra G raghaven...@gluster.com, Gluster Devel
gluster-devel@gluster.org
Sent: Wednesday, June 24, 2015 10:15:12 AM
Subject: Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is
consistently failing

When is this patch getting merged, this is blocking other patches to get
in.

Revert of http://review.gluster.org/11311, is waiting for regression runs
to pass. There are three patches (duplicates of each other). If anyone of
them pass both regression runs, I'll merge them. As far as refcounting
mechanism go, it'll take some time to review and merge the patch.

Once the revert patch is merged, we are good to go. Please let us know
once it's merged as post that all patches need rebase.

~Atin

~Atin

On 06/23/2015 06:26 PM, Niels de Vos wrote:

On Tue, Jun 23, 2015 at 05:30:39PM +0530, Vijaikumar M wrote:


On Tuesday 23 June 2015 04:28 PM, Niels de Vos wrote:

On Tue, Jun 23, 2015 at 03:45:43PM +0530, Vijaikumar M wrote:

I have submitted below patch which fixes this issue. I am handling
memory
clean-up with reference countmechanism.

http://review.gluster.org/#/c/11361

Is there a reason you can not use the (new) refcounting functions that
were introduceed with http://review.gluster.org/11022 ?

I was not aware that ref-counting patch was merged. Sure we will use
these
function and re-submit my patch.

Ok, thanks!
Niels


Thanks,
Vijay



It would be nicer to standardize all refcounting mechanisms on one
implementation. I hope we can replace existing refcounting with this
one
too. Introducing more refcounting ways is not going to be helpful.

Thanks,
Niels


Thanks,
Vijay




On Tuesday 23 June 2015 12:58 PM, Raghavendra G wrote:

Multiple replies to same query. Pick one ;).

On Tue, Jun 23, 2015 at 12:55 PM, Venky Shankar
yknev.shan...@gmail.com
mailto:yknev.shan...@gmail.com wrote:

OK. Two reverts of the same patch ;)

Pick one.

On Tue, Jun 23, 2015 at 12:51 PM, Raghavendra Gowdappa
rgowd...@redhat.com mailto:rgowd...@redhat.com wrote:
 Seems like its a memory corruption caused by:
 http://review.gluster.org/11311

 I've reverted the patch at:
 http://review.gluster.org/11360

 - Original Message -
 From: Xavier Hernandez xhernan...@datalab.es
mailto:xhernan...@datalab.es
 To: Gluster Devel gluster-devel@gluster.org
mailto:gluster-devel@gluster.org
 Sent: Tuesday, June 23, 2015 12:44:47 PM
 Subject: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is
consistently failing

 Hi,

 the quota test bug-1153964.t is failing consistently for a
 totally
 unrelated patch. Is this a known issue ?



http://build.gluster.org/job/rackspace-regression-2GB-triggered/11142/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11165/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11172/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11191/consoleFull

 Xavi
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


--
~Atin

Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is consistently failing

2015-06-23 Thread Vijaikumar M



On Tuesday 23 June 2015 04:28 PM, Niels de Vos wrote:

On Tue, Jun 23, 2015 at 03:45:43PM +0530, Vijaikumar M wrote:

I have submitted below patch which fixes this issue. I am handling memory
clean-up with reference countmechanism.

http://review.gluster.org/#/c/11361

Is there a reason you can not use the (new) refcounting functions that
were introduceed with http://review.gluster.org/11022 ?


I was not aware that ref-counting patch was merged. Sure we will use 
these function and re-submit my patch.


Thanks,
Vijay



It would be nicer to standardize all refcounting mechanisms on one
implementation. I hope we can replace existing refcounting with this one
too. Introducing more refcounting ways is not going to be helpful.

Thanks,
Niels


Thanks,
Vijay




On Tuesday 23 June 2015 12:58 PM, Raghavendra G wrote:

Multiple replies to same query. Pick one ;).

On Tue, Jun 23, 2015 at 12:55 PM, Venky Shankar yknev.shan...@gmail.com
mailto:yknev.shan...@gmail.com wrote:

OK. Two reverts of the same patch ;)

Pick one.

On Tue, Jun 23, 2015 at 12:51 PM, Raghavendra Gowdappa
rgowd...@redhat.com mailto:rgowd...@redhat.com wrote:
 Seems like its a memory corruption caused by:
 http://review.gluster.org/11311

 I've reverted the patch at:
 http://review.gluster.org/11360

 - Original Message -
 From: Xavier Hernandez xhernan...@datalab.es
mailto:xhernan...@datalab.es
 To: Gluster Devel gluster-devel@gluster.org
mailto:gluster-devel@gluster.org
 Sent: Tuesday, June 23, 2015 12:44:47 PM
 Subject: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is
consistently failing

 Hi,

 the quota test bug-1153964.t is failing consistently for a totally
 unrelated patch. Is this a known issue ?



http://build.gluster.org/job/rackspace-regression-2GB-triggered/11142/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11165/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11172/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11191/consoleFull

 Xavi
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is consistently failing

2015-06-23 Thread Vijaikumar M
I have submitted below patch which fixes this issue. I am handling 
memory clean-up with reference countmechanism.


http://review.gluster.org/#/c/11361

Thanks,
Vijay




On Tuesday 23 June 2015 12:58 PM, Raghavendra G wrote:

Multiple replies to same query. Pick one ;).

On Tue, Jun 23, 2015 at 12:55 PM, Venky Shankar 
yknev.shan...@gmail.com mailto:yknev.shan...@gmail.com wrote:


OK. Two reverts of the same patch ;)

Pick one.

On Tue, Jun 23, 2015 at 12:51 PM, Raghavendra Gowdappa
rgowd...@redhat.com mailto:rgowd...@redhat.com wrote:
 Seems like its a memory corruption caused by:
 http://review.gluster.org/11311

 I've reverted the patch at:
 http://review.gluster.org/11360

 - Original Message -
 From: Xavier Hernandez xhernan...@datalab.es
mailto:xhernan...@datalab.es
 To: Gluster Devel gluster-devel@gluster.org
mailto:gluster-devel@gluster.org
 Sent: Tuesday, June 23, 2015 12:44:47 PM
 Subject: [Gluster-devel] /tests/bugs/quota/bug-1153964.t is
consistently failing

 Hi,

 the quota test bug-1153964.t is failing consistently for a totally
 unrelated patch. Is this a known issue ?



http://build.gluster.org/job/rackspace-regression-2GB-triggered/11142/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11165/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11172/consoleFull


http://build.gluster.org/job/rackspace-regression-2GB-triggered/11191/consoleFull

 Xavi
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-26 Thread Vijaikumar M

Here is the status on quota test-case spurious failure:

There were 3 issues
1) Quota exceeding the limit because of parallel writes - Merged 
Upstream, patch submitted to release-3.7 #10910

./tests/bugs/quota/bug-1038598.t
./tests/bugs/distribute/bug-1161156.t
2) Quoting accounting going wrong - Patch Submitted #10918
./tests/basic/ec/quota.t
./tests/basic/quota-nfs.t
3) Quota with anonymous FDs on NetBSD:
   This is NFS client caching issue on NetBSD. Sachin and Myself are 
working on this issue.

./tests/basic/quota-anon-fd-nfs.t


Thanks,
Vijay


On Friday 22 May 2015 11:45 PM, Vijay Bellur wrote:

On 05/21/2015 12:07 AM, Vijay Bellur wrote:

On 05/19/2015 11:56 PM, Vijay Bellur wrote:

On 05/18/2015 08:03 PM, Vijay Bellur wrote:

On 05/16/2015 03:34 PM, Vijay Bellur wrote:



I will send daily status updates from Monday (05/18) about this so 
that

we are clear about where we are and what needs to be done to remove
this
moratorium. Appreciate your help in having a clean set of regression
tests going forward!



We have made some progress since Saturday. The problem with glupy.t 
has

been fixed - thanks to Niels! All but following tests have developers
looking into them:

 ./tests/basic/afr/entry-self-heal.t

 ./tests/bugs/replicate/bug-976800.t

 ./tests/bugs/replicate/bug-1015990.t

 ./tests/bugs/quota/bug-1038598.t

 ./tests/basic/ec/quota.t

 ./tests/basic/quota-nfs.t

 ./tests/bugs/glusterd/bug-974007.t

Can submitters of these test cases or current feature owners pick 
these

up and start looking into the failures please? Do update the spurious
failures etherpad [1] once you pick up a particular test.


[1] https://public.pad.fsfe.org/p/gluster-spurious-failures



Update for today - all tests that are known to fail have owners. Thanks
everyone for chipping in! I think we should be able to lift this
moratorium and resume normal patch acceptance shortly.



Today's update - Pranith fixed a bunch of failures in erasure coding and
Avra removed a test that was not relevant anymore - thanks for that!

Quota, afr, snapshot  tiering tests are being looked into. Will provide
an update on where we are with these tomorrow.



A few tests have not been readily reproducible. Of the remaining 
tests, all but the following have either been root caused or we have 
patches in review:


./tests/basic/mount-nfs-auth.t
./tests/performance/open-behind.t
./tests/basic/ec/ec-5-2.t
./tests/basic/quota-nfs.t

With some reviews and investigations of failing tests happening over 
the weekend, I am optimistic about being able to accept patches as 
usual from early next week.


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-21 Thread Vijaikumar M



On Tuesday 19 May 2015 09:50 PM, Shyam wrote:

On 05/19/2015 11:23 AM, Vijaikumar M wrote:



On Tuesday 19 May 2015 08:36 PM, Shyam wrote:

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following
criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are 
delivered to

application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the
WIND and updated during UNWIND, and we have io threads doing in flight
IOs (as well as possible IOs in io threads queue) and we have 256K
writes in the case mentioned. Put together, in my head this forms a
good RCA that we write more than needed due to the in flight IOs on
the brick. We need to control the in flight IOs as a resolution for
this from the application.

In terms of actual proof, we would need to instrument the code and
check. When you say it does not fail for you, does the file stop once
quota is reached or is a random size greater than quota? Which itself
may explain or point to the RCA.

The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use
oflag=sync we can achieve the same, if we use higher block sizes we
cannot

Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I
changed the dd line that was failing as TEST ! dd if=/dev/zero
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this
fails at times, as the writes here are sent as 256k chunks to the
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off +
performance.flush-behind: off + performance.nfs.strict-write-ordering:
on + performance.strict-write-ordering: on +
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created
successfully in the modified dd command above

Overall, the switch works, but not always. If we are to use this
variant then we need to announce that all quota tests using dd not try
to go beyond the quota limit set in a single IO from dd.

2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs
from NFS client
  - The test would work if we reduce the block size per IO (reliably
is a border condition here, and we need specific rules like block size
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check
a separate dd instance as being able to *not* exceed the quota. Which
is why I put up that patch.

What next?


Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see
the issue.Can you please try this option and see if it works?


Did that (in the attached script that I sent) and it still failed.

Please note:
- This dd command passes (or fails with EDQUOT)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240 
oflag=append oflag=sync conv=fdatasync
  - We can even drop append and fdatasync, as sync sends a commit per 
block written which is better for the test and quota enforcement, 
whereas fdatasync does one in the end and sometimes fails (with larger 
block sizes, say 1M)

  - We can change bs to [512 - 256k]

- This dd command fails (or writes all the data)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append 
oflag=sync conv=fdatasync


The reasoning is that when we write a larger block size, NFS sends in 
multiple 256k chunks to write and then sends the commit before the 
next block. As a result if we exceed quota in the *last block* that we 
are writing, we *may* fail. If we exceed quota in the last but one 
block we will pass.


Hope this shorter version explains it better.

(VijayM is educating me on quota (over IM), and it looks like the 
quota update happens as a synctask in the background, so post the 
flush (NFS commit) we

Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-21 Thread Vijaikumar M



On Thursday 21 May 2015 06:48 PM, Shyam wrote:

On 05/21/2015 04:04 AM, Vijaikumar M wrote:

On Tuesday 19 May 2015 09:50 PM, Shyam wrote:

On 05/19/2015 11:23 AM, Vijaikumar M wrote:

Did that (in the attached script that I sent) and it still failed.

Please note:
- This dd command passes (or fails with EDQUOT)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240
oflag=append oflag=sync conv=fdatasync
  - We can even drop append and fdatasync, as sync sends a commit per
block written which is better for the test and quota enforcement,
whereas fdatasync does one in the end and sometimes fails (with larger
block sizes, say 1M)
  - We can change bs to [512 - 256k]

- This dd command fails (or writes all the data)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append
oflag=sync conv=fdatasync

The reasoning is that when we write a larger block size, NFS sends in
multiple 256k chunks to write and then sends the commit before the
next block. As a result if we exceed quota in the *last block* that we
are writing, we *may* fail. If we exceed quota in the last but one
block we will pass.

Hope this shorter version explains it better.

(VijayM is educating me on quota (over IM), and it looks like the
quota update happens as a synctask in the background, so post the
flush (NFS commit) we may still have a race)

Post education solution:
- Quota updates on disk xattr as a sync task, as a result if we
exceeded quota in the n-1th block there is no guarantee that the nth
block would fail, as the sync task may not have completed

So I think we need to do the following for the quota based tests
(expanding on the provided patch, 
http://review.gluster.org/#/c/10811/ )

- First dd that exceeds quota (with either oflag=sync or
conv=fdatasync so that we do not see any flush behind or write behind
effects) to be done without checks
- Next check in an EXPECT_WITHIN that quota is exceeded (maybe add
checks on the just created/appended file w.r.t its minimum size that
would make it exceed the quota)
- Then do a further dd to a new file or append to an existing file to
get the EDQUOT error
- Proceed with whatever the test case needs to do next

Suggestions?



Here is my analysis on spurious failure with testcase:
tests/bugs/distribute/bug-1161156.t
In release-3.7, marker is re-factored to use synctask to do background
accounting.
I have done below tests with different combination and found that
parallel writes is causing the spurious failure.
I have filed a bug# 1223658 to track parallel write issue with quota.


Agreed with the observations, tallies with mine. Just one addition, 
when we write 256k or less, the writes become serial as NFS writes in 
256k chunks, and due to oflag=sync it follows up with a flush, correct?



Yes

Test (2) is interesting, even with marker foreground updates (which is 
still in the UNWIND path), we observe failures. Do we know why? My 
analysis/understanding of the same is that we have more in flight IOs 
that passed quota enforcement (due to accounting on the UNWIND path), 
does this bear any merit post your tests?


Yes, my understanding is same that it could be because of more in-flight 
IOs and there is not much impact if the marker is doing background updates.






1) Parallel writes and Marker background update (Test always fails)
 TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2
conv=fdatasync oflag=sync oflag=append

 NFS client breaks 3M writes into multiple 256k chunks and does
parallel writes

2) Parallel writes and Marker foreground update (Test always fails)
 TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2
conv=fdatasync oflag=sync oflag=append

 Made a marker code change to account quota in foreground (without
synctask)

3) Serial writes and Marker background update (Test passed 100/100 
times)

 TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=256k count=24
conv=fdatasync oflag=sync oflag=append

 Using smaller block size (256k), so that NFS client reduces
parallel writes

4) Serial writes and Marker foreground update (Test passed 100/100 
times)

 TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=256k count=24
conv=fdatasync oflag=sync oflag=append

 Using smaller block size (256k), so that NFS client reduces
parallel writes
 Made a marker code change to account quota in foreground (without
synctask)

5) Parallel writes on release-3.6 (Test always fails)
 TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2
conv=fdatasync oflag=sync oflag=append
 Moved marker xlator above IO-Threads in the graph.

Thanks,
Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-20 Thread Vijaikumar M



On Tuesday 19 May 2015 09:50 PM, Shyam wrote:

On 05/19/2015 11:23 AM, Vijaikumar M wrote:



On Tuesday 19 May 2015 08:36 PM, Shyam wrote:

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following
criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are 
delivered to

application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the
WIND and updated during UNWIND, and we have io threads doing in flight
IOs (as well as possible IOs in io threads queue) and we have 256K
writes in the case mentioned. Put together, in my head this forms a
good RCA that we write more than needed due to the in flight IOs on
the brick. We need to control the in flight IOs as a resolution for
this from the application.

In terms of actual proof, we would need to instrument the code and
check. When you say it does not fail for you, does the file stop once
quota is reached or is a random size greater than quota? Which itself
may explain or point to the RCA.

The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use
oflag=sync we can achieve the same, if we use higher block sizes we
cannot

Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I
changed the dd line that was failing as TEST ! dd if=/dev/zero
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this
fails at times, as the writes here are sent as 256k chunks to the
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off +
performance.flush-behind: off + performance.nfs.strict-write-ordering:
on + performance.strict-write-ordering: on +
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created
successfully in the modified dd command above

Overall, the switch works, but not always. If we are to use this
variant then we need to announce that all quota tests using dd not try
to go beyond the quota limit set in a single IO from dd.

2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs
from NFS client
  - The test would work if we reduce the block size per IO (reliably
is a border condition here, and we need specific rules like block size
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check
a separate dd instance as being able to *not* exceed the quota. Which
is why I put up that patch.

What next?


Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see
the issue.Can you please try this option and see if it works?


Did that (in the attached script that I sent) and it still failed.

Please note:
- This dd command passes (or fails with EDQUOT)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240 
oflag=append oflag=sync conv=fdatasync
  - We can even drop append and fdatasync, as sync sends a commit per 
block written which is better for the test and quota enforcement, 
whereas fdatasync does one in the end and sometimes fails (with larger 
block sizes, say 1M)

  - We can change bs to [512 - 256k]

Here you are trying to write 5M of data which is always written and test 
will fail.




- This dd command fails (or writes all the data)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append 
oflag=sync conv=fdatasync


Here you are trying to write 6M of data (Exceeding only 1M of quota 
limit) and test can fail. With count=3, test passes




The reasoning is that when we write a larger block size, NFS sends in 
multiple 256k chunks to write and then sends the commit before the 
next block. As a result if we exceed quota in the *last block* that we 
are writing, we *may* fail. If we exceed quota in the last but one 
block we

Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijaikumar M



On Tuesday 19 May 2015 06:13 AM, Shyam wrote:

On 05/18/2015 07:05 PM, Shyam wrote:

On 05/18/2015 03:49 PM, Shyam wrote:

On 05/18/2015 10:33 AM, Vijay Bellur wrote:

The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
which did not have an owner, and so I took a stab at it and below are
the results.

I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
the observation below.

NOTE: Anyone with better knowledge of Quota can possibly chip in as to
what should we expect in this case and how to correct the expectation
from these test cases.

(Details of ./tests/bugs/distribute/bug-1161156.t)
1) Failure is in TEST #20
Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
count=10240 conv=fdatasync

2) The above line is expected to fail (i.e dd is expected to fail) as,
the set quota is 20MB and we are attempting to exceed it by another 5MB
at this point in the test case.

3) The failure is easily reproducible in my laptop, 2/10 times

4) On debugging, I see that when the above dd succeeds (or the test
fails, which means dd succeeded in writing more than the set quota),
there are no write errors from the bricks or any errors on the final
COMMIT RPC call to NFS.

As a result the expectation of this test fails.

NOTE: Sometimes there is a write failure from one of the bricks (the
above test uses AFR as well), but AFR self healing kicks in and fixes
the problem, as expected, as the write succeeded on one of the 
replicas.

I add this observation, as the failed regression run logs, has some
EDQUOT errors reported in the client xlator, but only from one of the
client bricks, and there are further AFR self heal logs noted in the
logs.

5) When the test case succeeds the writes fail with EDQUOT as expected.
There are times when the quota is exceeded by say 1MB - 4.8MB, but the
test case still passes. Which means that, if we were to try to exceed
the quota by 1MB (instead of the 5MB as in the test case), this test
case may fail always.


Here is why I think this passes by quota sometime and not others making
this and the other test case mentioned below spurious.
- Each write is 256K from the client (that is what is sent over the 
wire)

- If more IO was queued by io-threads after passing quota checks, which
in this 5MB case requires 20 IOs to be queued (16 IOs could be active
in io-threads itself), we could end up writing more than the quota 
amount


So, if quota checks to see if a write is violating the quota, and let's
it through, and updates on the UNWIND the space used for future checks,
we could have more IO outstanding than what the quota allows, and as a
result allow such a larger write to pass through, considering IO threads
queue and active IOs as well. Would this be a fair assumption of how
quota works?

I believe this is what is happening in this case. Checking a fix on my
machine, and will post the same if it proves to be help the situation.


Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/

There are arguably other ways to fix/overcome the same, this seemed 
apt for this test case though.






6) Note on dd with conv=fdatasync
As one of the fixes attempts to overcome this issue with the 
addition of

conv=fdatasync, wanted to cover that behavior here.

What the above parameter does is to send an NFS_COMMIT (which 
internally

becomes a flush FOP) at the end of writing the blocks to the NFS share.
This commit as a result triggers any pending writes for this file and
sends the flush to the brick, all of which succeeds at times, resulting
in the failure of the test case.

NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is
pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero
of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to be
exceeded and there are no write failures in the logs (which should be
expected with EDQUOT (122))).


Currently we are not accounting in-progress writes (It is bit 
complicated to account in-progress writes).
When a write is successful, the accounting for this is done by marker 
asynchronously. We can get other writes before the marker completes 
accounting the previously written size.
So there is small window where we exceed the quota limit. In the 
testcase we are attempting to write 5MB more, we may need to change this 
to write few more MBs.


Thanks,
Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijaikumar M



On Tuesday 19 May 2015 08:36 PM, Shyam wrote:

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following 
criteria:


* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are delivered to
application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the 
WIND and updated during UNWIND, and we have io threads doing in flight 
IOs (as well as possible IOs in io threads queue) and we have 256K 
writes in the case mentioned. Put together, in my head this forms a 
good RCA that we write more than needed due to the in flight IOs on 
the brick. We need to control the in flight IOs as a resolution for 
this from the application.


In terms of actual proof, we would need to instrument the code and 
check. When you say it does not fail for you, does the file stop once 
quota is reached or is a random size greater than quota? Which itself 
may explain or point to the RCA.


The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the 
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use 
oflag=sync we can achieve the same, if we use higher block sizes we 
cannot


Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO 
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due 
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I 
changed the dd line that was failing as TEST ! dd if=/dev/zero 
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this 
fails at times, as the writes here are sent as 256k chunks to the 
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off + 
performance.flush-behind: off + performance.nfs.strict-write-ordering: 
on + performance.strict-write-ordering: on + 
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created 
successfully in the modified dd command above


Overall, the switch works, but not always. If we are to use this 
variant then we need to announce that all quota tests using dd not try 
to go beyond the quota limit set in a single IO from dd.


2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as 
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using 
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs 
from NFS client
  - The test would work if we reduce the block size per IO (reliably 
is a border condition here, and we need specific rules like block size 
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check 
a separate dd instance as being able to *not* exceed the quota. Which 
is why I put up that patch.


What next?


Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see 
the issue.Can you please try this option and see if it works?


Thanks,
Vijay



regards,
Raghavendra.

On Tue, May 19, 2015 at 5:27 PM, Raghavendra G raghaven...@gluster.com
mailto:raghaven...@gluster.com wrote:



On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com
mailto:jda...@redhat.com wrote:

 No, my suggestion was aimed at not having parallel writes. 
In this case quota
 won't even fail the writes with EDQUOT because of reasons 
explained above.
 Yes, we need to disable flush-behind along with this so 
that errors are

 delivered to application.

Would conv=sync help here?  That should prevent any kind of
write parallelism.


An strace of dd shows that

* fdatasync is issued only once at the end of all writes when
conv=fdatasync
* for some strange reason no fsync or fdatasync is issued at all
when conv=sync

So, using conv=fdatasync in the test cannot prevent
write-parallelism induced by write-behind. Parallelism would've been
prevented only if dd had issued fdatasync after each write or opened
the file with O_SYNC.

If it doesn't, I'd say that's a true test failure somewhere in
our stack.  A
similar possibility would be to 

Re: [Gluster-devel] NetBSD regression in quota-nfs.t

2015-03-18 Thread Vijaikumar M

Hi Emmanuel,

I have submitted another patch: http://review.gluster.org/#/c/9478/ for 
addressing the spurious failure with quota-nfs.t


Thanks,
Vijay


On Wednesday 18 March 2015 07:40 PM, Emmanuel Dreyfus wrote:

On Wed, Mar 18, 2015 at 10:28:37AM +, Emmanuel Dreyfus wrote:

Indeed, the test passes with this patch:

And when submitting it I noticed ithat change has laready been done.



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] uss.t in master doing bad things to our regression test VM's

2015-02-19 Thread Vijaikumar M

Hi Justin,

I have submitted patch 'http://review.gluster.org/#/c/9703/', used a 
different approach to generate a random string.


Thanks,
Vijay



On Thursday 19 February 2015 05:21 PM, Vijaikumar M wrote:


On Wednesday 18 February 2015 10:42 PM, Justin Clift wrote:

Hi Vijaikumar,

As part of investigating what is going wrong with our VM's in Rackspace,
I created several new VM's (11 of them) and started a full regression
test run on them.

They're all hitting a major problem with uss.t.  Part of it does a cat
on /dev/urandom... which is taking several hours at 100% of a cpu. :(

Here is output from ps -ef f on one of them:

root  12094  1287  0 13:23 ? S   0:00  \_ /bin/bash 
/opt/qa/regression.sh

root  12101 12094  0 13:23 ? S   0:00  \_ /bin/bash ./run-tests.sh
root  12116 12101  0 13:23 ? S   0:01  \_ /usr/bin/perl 
/usr/bin/prove -rf --timer ./tests
root382 12116  0 14:13 ? S   0:00  \_ /bin/bash 
./tests/basic/uss.t
root   1713   382  0 14:14 ? S   0:00  \_ /bin/bash 
./tests/basic/uss.t
root   1714  1713 96 14:14 ? R 166:31  \_ cat 
/dev/urandom
root   1715  1713  2 14:14 ? S   5:04  \_ tr -dc 
a-zA-Z

root   1716  1713  9 14:14 ? S  16:31  \_ fold -w 8

And from top:

top - 17:09:19 up  3:50,  1 user,  load average: 1.04, 1.03, 1.00
Tasks: 240 total,   3 running, 237 sleeping,   0 stopped,   0 zombie
Cpu0  :  4.3%us, 95.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 
0.0%si,  0.0%st
Cpu1  :  8.1%us, 15.9%sy,  0.0%ni, 76.0%id,  0.0%wa,  0.0%hi, 
0.0%si,  0.0%st

Mem:   1916672k total,  1119544k used,   797128k free,   114976k buffers
Swap:0k total,0k used,0k free,   427032k cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+ COMMAND
  1714 root  20   0 98.6m  620  504 R 96.0  0.0 169:00.94 cat
   137 root  20   0 36100 1396 1140 S 15.9  0.1  37:01.55 plymouthd
  1716 root  20   0 98.6m  712  616 S 10.0  0.0  16:46.55 fold
  1715 root  20   0 98.6m  636  540 S  2.7  0.0   5:08.95 tr
 9 root  20   0 000 S  0.3  0.0   0:00.59 
ksoftirqd/1

 1 root  20   0 19232 1128  860 S  0.0  0.1   0:00.93 init
 2 root  20   0 000 S  0.0  0.0   0:00.00 kthreadd

Your name is on the commit which added the code, but that was months 
ago.


No idea why it's suddenly being a problem.  Do you have any idea?

I am going to shut down all of these new test VM's except one, which 
I can
give you (or anyone) access to, if that would help find and fix the 
problem.

I am not sure why suddenly this is causing a problem.
I can remove 'cat urandom' and use different approach to test this 
particular case.


Thanks,
Vijay



Btw, this is pretty important. ;)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Quota with hard-links

2015-01-27 Thread Vijaikumar M

Hi All,

This is regarding quota accounting for hard-links. Currently, accounting 
is done only once for the links created within the same directory and 
accounting is done separately when links are created in separate directory.
With this approach account may go wrong when rename is performed on the 
hardlink files across directories.


We are implementing one of the below mentioned policy for hard-links 
when quota is enabled.


1) Allow creating hard-links only within same directory.
(We can hit the same problem if quota is enabled on the 
pre-existing data which contains hard-links)


2) Allow creating hard-links only within the same branch where the limit 
is set
(We can hit the same problem if quota is enabled on the 
pre-existing data which contains hard-links and also when quota-limit is 
set/unset)


3) Account for all the hard-links.
(Marker treats all hard-links as a new file)

Please provide your feedback.

Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2015-01-27 Thread Vijaikumar M

Hi Kiran,

Testcase './tests/basic/quota-anon-fd-nfs.t' is removed from the test 
suite.There are some issues with this testcase, we are working on it. We 
will add this test-case soon once the issue is fixed.


Thanks,
Vijay


On Tuesday 27 January 2015 06:11 PM, Vijaikumar M wrote:

Hi Kiran,

quota.t failure issue has been fixed with patch 
http://review.gluster.org/#/c/9410/. Can you please re-try the test 
with this patch and see if it works?


Thanks,
Vijay


On Wednesday 19 November 2014 10:32 AM, Pranith Kumar Karampuri wrote:


On 11/19/2014 10:30 AM, Atin Mukherjee wrote:


On 11/18/2014 10:35 PM, Pranith Kumar Karampuri wrote:

On 11/12/2014 04:52 PM, Kiran Patil wrote:

I have create zpool with name d and mnt and they appear in filesystem
as follows.

d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar

quota-anon-fd-nfs.t spurious failure fix is addressed by
http://review.gluster.org/#/c/9108/

This is just quota.t in tests/basic, not the anon-fd one

Pranith

~Atin

On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil ki...@fractalio.com
mailto:ki...@fractalio.com wrote:

 Hi,

 Gluster suite report,

 Gluster version: glusterfs 3.6.1

 On disk filesystem: Zfs 0.6.3-1.1

 Operating system: CentOS release 6.6 (Final)

 We are seeing quota and snapshot testcase failures.

 We are not sure why quota is failing since quotas worked fine on
 gluster 3.4.

 Test Summary Report
 ---
 ./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16 
Failed: 1)

   Failed test:  16
 ./tests/basic/quota.t  (Wstat: 0 Tests: 73 Failed: 4)
   Failed tests:  24, 28, 32, 65
 ./tests/basic/uss.t  (Wstat: 0 Tests: 147 Failed: 78)
   Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
 49-57, 60-61, 63-64, 71-72, 78-87, 90-91
 93-94, 101-102, 107-115, 118-119, 121-122
 129-130, 134, 136-137, 139-140, 142-143
 145-146
 ./tests/basic/volume-snapshot.t  (Wstat: 0 Tests: 30 Failed: 12)
   Failed tests:  11-18, 21-24
 ./tests/basic/volume-status.t  (Wstat: 0 Tests: 14 Failed: 1)
   Failed test:  14
 ./tests/bugs/bug-1023974.t   (Wstat: 0 Tests: 15 Failed: 1)
   Failed test:  12
 ./tests/bugs/bug-1038598.t   (Wstat: 0 Tests: 28 Failed: 6)
   Failed tests:  17, 21-22, 26-28
 ./tests/bugs/bug-1045333.t   (Wstat: 0 Tests: 16 Failed: 9)
   Failed tests:  7-15
 ./tests/bugs/bug-1049834.t   (Wstat: 0 Tests: 18 Failed: 7)
   Failed tests:  11-14, 16-18
 ./tests/bugs/bug-1087203.t   (Wstat: 0 Tests: 43 Failed: 2)
   Failed tests:  31, 41
 ./tests/bugs/bug-1090042.t   (Wstat: 0 Tests: 12 Failed: 3)
   Failed tests:  9-11
 ./tests/bugs/bug-1109770.t   (Wstat: 0 Tests: 19 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1109889.t   (Wstat: 0 Tests: 20 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1112559.t   (Wstat: 0 Tests: 11 Failed: 3)
   Failed tests:  8-9, 11
 ./tests/bugs/bug-1112613.t   (Wstat: 0 Tests: 22 Failed: 5)
   Failed tests:  12-14, 17-18
 ./tests/bugs/bug-1113975.t   (Wstat: 0 Tests: 13 Failed: 4)
   Failed tests:  8-9, 11-12
 ./tests/bugs/bug-847622.t  (Wstat: 0 Tests: 10 Failed: 1)
   Failed test:  8
 ./tests/bugs/bug-861542.t  (Wstat: 0 Tests: 13 Failed: 7)
   Failed tests:  7-13
 ./tests/features/ssl-authz.t   (Wstat: 0 Tests: 18 Failed: 1)
   Failed test:  18
 Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr 0.78 sys +
 774.74 cusr 666.97 csys = 1447.05 CPU)
 Result: FAIL

 Thanks,
 Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel







___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2015-01-27 Thread Vijaikumar M

Hi Kiran,

quota.t failure issue has been fixed with patch 
http://review.gluster.org/#/c/9410/. Can you please re-try the test with 
this patch and see if it works?


Thanks,
Vijay


On Wednesday 19 November 2014 10:32 AM, Pranith Kumar Karampuri wrote:


On 11/19/2014 10:30 AM, Atin Mukherjee wrote:


On 11/18/2014 10:35 PM, Pranith Kumar Karampuri wrote:

On 11/12/2014 04:52 PM, Kiran Patil wrote:

I have create zpool with name d and mnt and they appear in filesystem
as follows.

d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar

quota-anon-fd-nfs.t spurious failure fix is addressed by
http://review.gluster.org/#/c/9108/

This is just quota.t in tests/basic, not the anon-fd one

Pranith

~Atin

On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil ki...@fractalio.com
mailto:ki...@fractalio.com wrote:

 Hi,

 Gluster suite report,

 Gluster version: glusterfs 3.6.1

 On disk filesystem: Zfs 0.6.3-1.1

 Operating system: CentOS release 6.6 (Final)

 We are seeing quota and snapshot testcase failures.

 We are not sure why quota is failing since quotas worked fine on
 gluster 3.4.

 Test Summary Report
 ---
 ./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16 Failed: 1)
   Failed test:  16
 ./tests/basic/quota.t  (Wstat: 0 Tests: 73 Failed: 4)
   Failed tests:  24, 28, 32, 65
 ./tests/basic/uss.t  (Wstat: 0 Tests: 147 Failed: 78)
   Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
 49-57, 60-61, 63-64, 71-72, 78-87, 90-91
 93-94, 101-102, 107-115, 118-119, 121-122
 129-130, 134, 136-137, 139-140, 142-143
 145-146
 ./tests/basic/volume-snapshot.t  (Wstat: 0 Tests: 30 Failed: 12)
   Failed tests:  11-18, 21-24
 ./tests/basic/volume-status.t  (Wstat: 0 Tests: 14 Failed: 1)
   Failed test:  14
 ./tests/bugs/bug-1023974.t   (Wstat: 0 Tests: 15 Failed: 1)
   Failed test:  12
 ./tests/bugs/bug-1038598.t   (Wstat: 0 Tests: 28 Failed: 6)
   Failed tests:  17, 21-22, 26-28
 ./tests/bugs/bug-1045333.t   (Wstat: 0 Tests: 16 Failed: 9)
   Failed tests:  7-15
 ./tests/bugs/bug-1049834.t   (Wstat: 0 Tests: 18 Failed: 7)
   Failed tests:  11-14, 16-18
 ./tests/bugs/bug-1087203.t   (Wstat: 0 Tests: 43 Failed: 2)
   Failed tests:  31, 41
 ./tests/bugs/bug-1090042.t   (Wstat: 0 Tests: 12 Failed: 3)
   Failed tests:  9-11
 ./tests/bugs/bug-1109770.t   (Wstat: 0 Tests: 19 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1109889.t   (Wstat: 0 Tests: 20 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1112559.t   (Wstat: 0 Tests: 11 Failed: 3)
   Failed tests:  8-9, 11
 ./tests/bugs/bug-1112613.t   (Wstat: 0 Tests: 22 Failed: 5)
   Failed tests:  12-14, 17-18
 ./tests/bugs/bug-1113975.t   (Wstat: 0 Tests: 13 Failed: 4)
   Failed tests:  8-9, 11-12
 ./tests/bugs/bug-847622.t  (Wstat: 0 Tests: 10 Failed: 1)
   Failed test:  8
 ./tests/bugs/bug-861542.t  (Wstat: 0 Tests: 13 Failed: 7)
   Failed tests:  7-13
 ./tests/features/ssl-authz.t   (Wstat: 0 Tests: 18 Failed: 1)
   Failed test:  18
 Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr 0.78 sys +
 774.74 cusr 666.97 csys = 1447.05 CPU)
 Result: FAIL

 Thanks,
 Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Quota command bug in 3.6.1?

2015-01-14 Thread Vijaikumar M

Hi Raghuram,

Thanks for reporting the problem.

We will submit the fix upstream soon.

Thanks,
Vijay


On Wednesday 14 January 2015 01:50 PM, Raghuram BK wrote:
When I issue quota list command with the xml option, it seems to 
return non-xml data :


[root@fractalio-66f2 fractalio]# gluster --version
glusterfs 3.6.1 built on Jan 13 2015 16:46:51
Repository revision: git://git.gluster.com/glusterfs.git 
http://git.gluster.com/glusterfs.git

Copyright (c) 2006-2011 Gluster Inc. http://www.gluster.com
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU 
General Public License.


[root@primary templates]# gluster volume quota vol1  list --xml
?xml version=1.0 encoding=UTF-8 standalone=yes?
cliOutput
  opRet0/opRet
  opErrno0/opErrno
  opErrstr/
  volQuota/
/cliOutput
  Path   Hard-limit Soft-limit Used  
Available  Soft-limit exceeded? Hard-limit exceeded?

---
/ 10.0GB   80% 0Bytes  
10.0GB  No   No


--

*Fractalio Data, India*

Mobile: +91 96635 92022

Email: r...@fractalio.com mailto:g...@fractalio.com

Web: www.fractalio.com http://www.fractalio.com/




___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failure in quota test cases

2015-01-02 Thread Vijaikumar M
I see below error in the log file. I think some how old mount is not 
cleaned properly.



File: cli.log
[2014-12-30 11:23:19.553912] W 
[cli-cmd-volume.c:886:gf_cli_create_auxiliary_mount] 0-cli: failed to 
mount glusterfs client. Please check the log file 
/var/log/glusterfs/quota-mount-patchy.log for more details



File: quota-mount-patchy.log
[2014-12-30 09:54:38.093890] I [MSGID: 100030] [glusterfsd.c:2027:main] 
0-/build/install/sbin/glusterfs: Started running 
/build/install/sbin/glusterfs version 3.7dev (args: 
/build/install/sbin/glusterfs -s localhost --volfile-id patchy -l 
/var/log/glusterfs/quota-mount-patchy.log -p /var/run/gluster/patchy.pid 
--client-pid -5 /var/run/gluster/patchy/)
[2014-12-30 09:54:38.094546] E [fuse-bridge.c:5338:init] 0-fuse: 
Mountpoint /var/run/gluster/patchy/ seems to have a stale mount, run 
'umount /var/run/gluster/patchy/' and try again.




Can someone who have access to build machine, please clear the stale 
mount on '/var/run/gluster/patchy/'?



Thanks,
Vijay


On Friday 02 January 2015 04:38 PM, Atin Mukherjee wrote:

Hi Vijai,

It seems like lots of regression test cases are failing due to auxiliary
mount failure in cli and thats because of left over auxiliary mount points.

[2014-12-30 10:21:15.875965] E [fuse-bridge.c:5338:init] 0-fuse:
Mountpoint /var/run/gluster/patchy/ seems to have a stale mount, run
'umount /var/run/gluster/patchy/' and try again.

Once such instance can be found at [1]

Can you please look into it?

~Atin

[1]
http://build.gluster.org/job/rackspace-regression-2GB-triggered/3406/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] handling statfs call in USS

2014-12-24 Thread Vijaikumar M


On Wednesday 24 December 2014 02:30 PM, Raghavendra Bhat wrote:


Hi,

I have a doubt. In user serviceable snapshots as of now statfs call is 
not implemented. There are 2 ways how statfs can be handled.


1) Whenever snapview-client xlator gets statfs call on a path that 
belongs to snapshot world, it can send the
statfs call to the main volume itself, with the path and the inode 
being set to the root of the main volume.


In this approach, when statfs call is sent to main volume with path and 
inode set to the root can give incorrect value when quota and 
deem-statfs are enabled.

path/inode should be set to the parent of '.snaps'

Thanks,
Vijay


OR

2) It can redirect the call to the snapshot world (the snapshot demon 
which talks to all the snapshots of that particular volume) and send 
back the reply that it has obtained.


Please provide feedback.

Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Proposal for more sub-maintainers

2014-12-04 Thread Vijaikumar M


On Thursday 04 December 2014 08:32 PM, Niels de Vos wrote:

On Fri, Nov 28, 2014 at 01:08:29PM +0530, Vijay Bellur wrote:

Hi All,

To supplement our ongoing effort of better patch management, I am proposing
the addition of more sub-maintainers for various components. The rationale
behind this proposal  the responsibilities of maintainers continue to be
the same as discussed in these lists a while ago [1]. Here is the proposed
list:

Build - Kaleb Keithley  Niels de Vos

DHT   - Raghavendra Gowdappa  Shyam Ranganathan

docs  - Humble Chirammal  Lalatendu Mohanty

gfapi - Niels de Vos  Shyam Ranganathan

index  io-threads - Pranith Karampuri

posix - Pranith Karampuri  Raghavendra Bhat

I'm wondering if there are any volunteers for maintaining the FUSE
component?

And maybe rewrite it to use libgfapi and drop the mount.glusterfs
script?

I am interested.

Thanks,
Vijay



Niels


We intend to update Gerrit with this list by 8th of December. Please let us
know if you have objections, concerns or feedback on this process by then.

Thanks,
Vijay

[1] http://gluster.org/pipermail/gluster-devel/2014-April/025425.html

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] snapshot restore and USS

2014-12-01 Thread Vijaikumar M


On Monday 01 December 2014 05:36 PM, Raghavendra Bhat wrote:

On Monday 01 December 2014 04:51 PM, Raghavendra G wrote:



On Fri, Nov 28, 2014 at 6:48 PM, RAGHAVENDRA TALUR 
raghavendra.ta...@gmail.com mailto:raghavendra.ta...@gmail.com wrote:


On Thu, Nov 27, 2014 at 2:59 PM, Raghavendra Bhat
rab...@redhat.com mailto:rab...@redhat.com wrote:
 Hi,

 With USS to access snapshots, we depend on last snapshot of the
volume (or
 the latest snapshot) to resolve some issues.
 Ex:
 Say there is a directory called dir within the root of the
volume and USS
 is enabled. Now when .snaps is accessed from dir (i.e.
/dir/.snaps), first
 a lookup is sent on /dir which snapview-client xlator passes
onto the normal
 graph till posix xlator of the brick. Next the lookup comes on
/dir/.snaps.
 snapview-client xlator now redirects this call to the snap
daemon (since
 .snaps is a virtual directory to access the snapshots). The
lookup comes to
 snap daemon with parent gfid set to the gfid of /dir and the
basename
 being set to .snaps. Snap daemon will first try to resolve
the parent gfid
 by trying to find the inode for that gfid. But since that gfid
was not
 looked up before in the snap daemon, it will not be able to
find the inode.
 So now to resolve it, snap daemon depends upon the latest
snapshot. i.e. it
 tries to look up the gfid of /dir in the latest snapshot and if
it can get
 the gfid, then lookup on /dir/.snaps is also successful.

From the user point of view, I would like to be able to enter
into the
.snaps anywhere.
To be able to do that, we can turn the dependency upside down,
instead
of listing all
snaps in the .snaps dir, lets just show whatever snapshots had
that dir.


Currently readdir in snap-view server is listing _all_ the snapshots. 
However if you try to do ls on a snapshot which doesn't contain 
this directory (say dir/.snaps/snap3), I think it returns 
ESTALE/ENOENT. So, to get what you've explained above, readdir(p) 
should filter out those snapshots which doesn't contain this 
directory (to do that, it has to lookup dir on each of the snapshots).


Raghavendra Bhat explained the problem and also a possible solution 
to me in person. There are some pieces missing in the problem 
description as explained in the mail (but not in the discussion we 
had). The problem explained here occurs  when you restore a snapshot 
(say snap3) where the directory got created, but deleted before next 
snapshot. So, directory doesn't exist in snap2 and snap4, but exists 
only in snap3. Now, when you restore snap3, ls on dir/.snaps should 
show nothing. Now, what should be result of lookup (gfid-of-dir, 
.snaps) should be?


1. we can blindly return a virtual inode, assuming there is atleast 
one snapshot contains dir. If fops come on specific snapshots (eg., 
dir/.snaps/snap4), they'll anyways fail with ENOENT (since dir is not 
present on any snaps).
2. we can choose to return ENOENT if we figure out that dir is not 
present on any snaps.


The problem we are trying to solve here is how to achieve 2. One 
simple solution is to lookup for gfid-of-dir on all the snapshots 
and if every lookup fails with ENOENT, we can return ENOENT. The 
other solution is to just lookup in snapshots before and after (if 
both are present, otherwise just in latest snapshot). If both fail, 
then we can be sure that no snapshots contain that directory.


Rabhat, Correct me if I've missed out anything :).




If a readdir on .snaps entered from a non root directory has to show 
the list of only those snapshots where the directory (or rather gfid 
of the directory) is present, then the way to achieve will be bit costly.


When readdir comes on .snaps entered from a non root directory (say ls 
/dir/.snaps), following operations have to be performed
1) In a array we have the names of all the snapshots. So, do a 
nameless lookup on the gfid of /dir on all the snapshots
2) Based on which snapshots have sent success to the above lookup, 
build a new array or list of snapshots.

3) Then send the above new list as the readdir entries.

But the above operation it costlier. Because, just to serve one 
readdir request we have to make a lookup on each snapshot (if there 
are 256 snapshots, then we have to make 256 lookup calls via network).


One more thing is resource usage. As of now any snapshot will be 
initied (i.e. via gfapi a connection is established with the 
corresponding snapshot volume, which is equivalent to a mounted 
volume.) when that snapshot is accessed (from fops point of view a 
lookup comes on the snapshot entry, say ls /dir/.snaps/snap1). Now 
to serve readdir all the snapshots will be  accessed and all the 
snapshots are initialized. This means there can be 256 instances of 
gfapi connections with each instance having its own inode table and 
other resources). After 

Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2014-11-19 Thread Vijaikumar M

Hi Kiran,

Can we also get the xattrs of all the directories on the bricks.

How to capture xattrs of all dirs in a brick is here:

Edit quota.t and find all the lines that match 'EXPECT_WITHIN 
$MARKER_UPDATE_TIMEOUT .. usage .'

Add below lines after every match:

echo Matching Testcase   /var/tmp/quota-xattr.txt
for file in `find $B0 -type d`; do echo $file; getfattr -d -m . -e hex 
$file; echo; done  /var/tmp/quota-xattr.txt

echo  /var/tmp/quota-xattr.txt

Thanks,
Vijay


On Wednesday 19 November 2014 01:17 PM, Vijaikumar M wrote:

Hi Kiran,

Can we get the brick, client and quotad logs?

Thanks,
Vijay


On Tuesday 18 November 2014 10:35 PM, Pranith Kumar Karampuri wrote:


On 11/12/2014 04:52 PM, Kiran Patil wrote:
I have create zpool with name d and mnt and they appear in 
filesystem as follows.


d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar


On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil ki...@fractalio.com 
mailto:ki...@fractalio.com wrote:


Hi,

Gluster suite report,

Gluster version: glusterfs 3.6.1

On disk filesystem: Zfs 0.6.3-1.1

Operating system: CentOS release 6.6 (Final)

We are seeing quota and snapshot testcase failures.

We are not sure why quota is failing since quotas worked fine on
gluster 3.4.

Test Summary Report
---
./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16
Failed: 1)
  Failed test:  16
./tests/basic/quota.t  (Wstat: 0 Tests: 73 Failed: 4)
  Failed tests:  24, 28, 32, 65
./tests/basic/uss.t  (Wstat: 0 Tests: 147 Failed: 78)
  Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
49-57, 60-61, 63-64, 71-72, 78-87, 90-91
93-94, 101-102, 107-115, 118-119, 121-122
129-130, 134, 136-137, 139-140, 142-143
145-146
./tests/basic/volume-snapshot.t  (Wstat: 0 Tests: 30 Failed: 12)
  Failed tests:  11-18, 21-24
./tests/basic/volume-status.t  (Wstat: 0 Tests: 14 Failed: 1)
  Failed test:  14
./tests/bugs/bug-1023974.t   (Wstat: 0 Tests: 15 Failed: 1)
  Failed test:  12
./tests/bugs/bug-1038598.t   (Wstat: 0 Tests: 28 Failed: 6)
  Failed tests:  17, 21-22, 26-28
./tests/bugs/bug-1045333.t   (Wstat: 0 Tests: 16 Failed: 9)
  Failed tests:  7-15
./tests/bugs/bug-1049834.t   (Wstat: 0 Tests: 18 Failed: 7)
  Failed tests:  11-14, 16-18
./tests/bugs/bug-1087203.t   (Wstat: 0 Tests: 43 Failed: 2)
  Failed tests:  31, 41
./tests/bugs/bug-1090042.t   (Wstat: 0 Tests: 12 Failed: 3)
  Failed tests:  9-11
./tests/bugs/bug-1109770.t   (Wstat: 0 Tests: 19 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1109889.t   (Wstat: 0 Tests: 20 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1112559.t   (Wstat: 0 Tests: 11 Failed: 3)
  Failed tests:  8-9, 11
./tests/bugs/bug-1112613.t   (Wstat: 0 Tests: 22 Failed: 5)
  Failed tests:  12-14, 17-18
./tests/bugs/bug-1113975.t   (Wstat: 0 Tests: 13 Failed: 4)
  Failed tests:  8-9, 11-12
./tests/bugs/bug-847622.t  (Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  8
./tests/bugs/bug-861542.t  (Wstat: 0 Tests: 13 Failed: 7)
  Failed tests:  7-13
./tests/features/ssl-authz.t   (Wstat: 0 Tests: 18 Failed: 1)
  Failed test:  18
Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr  0.78 sys
+ 774.74 cusr 666.97 csys = 1447.05 CPU)
Result: FAIL

Thanks,
Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel






___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2014-11-18 Thread Vijaikumar M

Hi Kiran,

Can we get the brick, client and quotad logs?

Thanks,
Vijay


On Tuesday 18 November 2014 10:35 PM, Pranith Kumar Karampuri wrote:


On 11/12/2014 04:52 PM, Kiran Patil wrote:
I have create zpool with name d and mnt and they appear in filesystem 
as follows.


d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar


On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil ki...@fractalio.com 
mailto:ki...@fractalio.com wrote:


Hi,

Gluster suite report,

Gluster version: glusterfs 3.6.1

On disk filesystem: Zfs 0.6.3-1.1

Operating system: CentOS release 6.6 (Final)

We are seeing quota and snapshot testcase failures.

We are not sure why quota is failing since quotas worked fine on
gluster 3.4.

Test Summary Report
---
./tests/basic/quota-anon-fd-nfs.t(Wstat: 0 Tests: 16 Failed: 1)
  Failed test:  16
./tests/basic/quota.t(Wstat: 0 Tests: 73 Failed: 4)
  Failed tests:  24, 28, 32, 65
./tests/basic/uss.t(Wstat: 0 Tests: 147 Failed: 78)
  Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
49-57, 60-61, 63-64, 71-72, 78-87, 90-91
93-94, 101-102, 107-115, 118-119, 121-122
129-130, 134, 136-137, 139-140, 142-143
145-146
./tests/basic/volume-snapshot.t(Wstat: 0 Tests: 30 Failed: 12)
  Failed tests:  11-18, 21-24
./tests/basic/volume-status.t(Wstat: 0 Tests: 14 Failed: 1)
  Failed test:  14
./tests/bugs/bug-1023974.t (Wstat: 0 Tests: 15 Failed: 1)
  Failed test:  12
./tests/bugs/bug-1038598.t (Wstat: 0 Tests: 28 Failed: 6)
  Failed tests:  17, 21-22, 26-28
./tests/bugs/bug-1045333.t (Wstat: 0 Tests: 16 Failed: 9)
  Failed tests:  7-15
./tests/bugs/bug-1049834.t (Wstat: 0 Tests: 18 Failed: 7)
  Failed tests:  11-14, 16-18
./tests/bugs/bug-1087203.t (Wstat: 0 Tests: 43 Failed: 2)
  Failed tests:  31, 41
./tests/bugs/bug-1090042.t (Wstat: 0 Tests: 12 Failed: 3)
  Failed tests:  9-11
./tests/bugs/bug-1109770.t (Wstat: 0 Tests: 19 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1109889.t (Wstat: 0 Tests: 20 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1112559.t (Wstat: 0 Tests: 11 Failed: 3)
  Failed tests:  8-9, 11
./tests/bugs/bug-1112613.t (Wstat: 0 Tests: 22 Failed: 5)
  Failed tests:  12-14, 17-18
./tests/bugs/bug-1113975.t (Wstat: 0 Tests: 13 Failed: 4)
  Failed tests:  8-9, 11-12
./tests/bugs/bug-847622.t(Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  8
./tests/bugs/bug-861542.t(Wstat: 0 Tests: 13 Failed: 7)
  Failed tests:  7-13
./tests/features/ssl-authz.t (Wstat: 0 Tests: 18 Failed: 1)
  Failed test:  18
Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr  0.78 sys +
774.74 cusr 666.97 csys = 1447.05 CPU)
Result: FAIL

Thanks,
Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

2014-06-24 Thread Vijaikumar M

Hi Jeff,

Missed to add this:
SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 
'SSL_ERROR_WANT_READ'


Thanks,
Vijay


On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote:

Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ 
(epoll: edge triggered and multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
find the stack trace below).


In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there 
is no data available.
Please suggest if this is OK to do this way or do we need to 
restructure this function for multi-threaded epoll?


code: socket.c
 178 static int
 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
SSL_trinary_func *func)

 180 {
 

 211 switch (SSL_get_error(priv-ssl_ssl,r)) {
 212 case SSL_ERROR_NONE:
 213 return r;
 214 case SSL_ERROR_WANT_READ:
 215 if (SSL_pending(priv-ssl_ssl) == 0)
 216 return r;
 217 pfd.fd = priv-sock;
 221 if (poll(pfd,1,-1)  0) {
/code



Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
From the stack trace we found that function 'socket_submit_request' 
is waiting on mutext_lock.
lock is held by the function 'ssl_do' and this function is blocked by 
poll syscall.



(gdb) bt
#0  0x003daa80822d in pthread_join () from /lib64/libpthread.so.0
#1  0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=value 
optimized out) at event-epoll.c:632
#2  0x00407ecd in main (argc=4, argv=0x7fff160a4528) at 
glusterfsd.c:2023



(gdb) info threads
  10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait 
() from /lib64/libpthread.so.0
  8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () 
from /lib64/libc.so.6
  3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () 
from /lib64/libpthread.so.0
* 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in 
pthread_join () from /lib64/libpthread.so.0



*(gdb) thread 3**
**[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
0x003daa80e264 in __lll_lock_wait ()**

**   from /lib64/libpthread.so.0**
**(gdb) bt
#0  0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003daa8093d7 in pthread_mutex_lock () from 
/lib64/libpthread.so.0
#3  0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
req=0x7f3b8212f0b0) at socket.c:3134
*#4  0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
prog=value optimized out,
procnum=value optimized out, cbkfn=0x7f3b892364b0 
client3_3_lookup_cbk, proghdr=0x7f3b8212f410,
proghdrcount=1, progpayload=0x0, progpayloadcount=0, 
iobref=value optimized out, frame=0x7f3b93d2a454,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)

at rpc-clnt.c:1556
#5  0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
req=value optimized out,
frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, iobref=0x0,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,

xdrproc=0x7f3b94a4ede0 xdr_gfs3_lookup_req) at client.c:243
#6  0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
this=0x7f3b7c005ef0, data=0x7f3b8212f660)

at client-rpc-fops.c:3119


(gdb) p priv-lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
1, __kind = 0, __spins = 0, __list = {

  __prev = 0x0, __next = 0x0}},
  __size = \002\000\000\000\000\000\000\000\201f\000\000\001, 
'\000' repeats 26 times, __align = 2}



*(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
0x003daa4df343 in poll () from /lib64/libc.so.6

(gdb) bt
#0  0x003daa4df343 in poll () from /lib64/libc.so.6
#1  0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
buf=0x7f3b7c051264, len=4, func=0x3db2441570 SSL_read)

at socket.c:216
#2  0x7f3b8aa7277b in __socket_ssl_readv (this=value optimized 
out

Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

2014-06-24 Thread Vijaikumar M

Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: 
edge triggered and multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
find the stack trace below).


In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there is 
no data available.
Please suggest if this is OK to do this way or do we need to restructure 
this function for multi-threaded epoll?


code: socket.c
 178 static int
 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
SSL_trinary_func *func)

 180 {
 

 211 switch (SSL_get_error(priv-ssl_ssl,r)) {
 212 case SSL_ERROR_NONE:
 213 return r;
 214 case SSL_ERROR_WANT_READ:
 215 if (SSL_pending(priv-ssl_ssl) == 0)
 216 return r;
 217 pfd.fd = priv-sock;
 221 if (poll(pfd,1,-1)  0) {
/code



Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
From the stack trace we found that function 'socket_submit_request' is 
waiting on mutext_lock.
lock is held by the function 'ssl_do' and this function is blocked by 
poll syscall.



(gdb) bt
#0  0x003daa80822d in pthread_join () from /lib64/libpthread.so.0
#1  0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=value 
optimized out) at event-epoll.c:632
#2  0x00407ecd in main (argc=4, argv=0x7fff160a4528) at 
glusterfsd.c:2023



(gdb) info threads
  10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  9 Thread 0x7f3b8ca82700 (LWP 26226)  0x003daa80f4b5 in sigwait 
() from /lib64/libpthread.so.0
  8 Thread 0x7f3b8c081700 (LWP 26227)  0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  7 Thread 0x7f3b8b680700 (LWP 26228)  0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  6 Thread 0x7f3b8a854700 (LWP 26232)  0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  5 Thread 0x7f3b89e53700 (LWP 26233)  0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  4 Thread 0x7f3b833eb700 (LWP 26241)  0x003daa4df343 in poll () 
from /lib64/libc.so.6
  3 Thread 0x7f3b82130700 (LWP 26245)  0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f3b8172f700 (LWP 26247)  0x003daa80e75d in read () 
from /lib64/libpthread.so.0
* 1 Thread 0x7f3b94a38700 (LWP 26224)  0x003daa80822d in 
pthread_join () from /lib64/libpthread.so.0



*(gdb) thread 3**
**[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
0x003daa80e264 in __lll_lock_wait ()**

**   from /lib64/libpthread.so.0**
**(gdb) bt
#0  0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003daa8093d7 in pthread_mutex_lock () from 
/lib64/libpthread.so.0
#3  0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
req=0x7f3b8212f0b0) at socket.c:3134
*#4  0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
prog=value optimized out,
procnum=value optimized out, cbkfn=0x7f3b892364b0 
client3_3_lookup_cbk, proghdr=0x7f3b8212f410,
proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=value 
optimized out, frame=0x7f3b93d2a454,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)

at rpc-clnt.c:1556
#5  0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
req=value optimized out,
frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, iobref=0x0,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,

xdrproc=0x7f3b94a4ede0 xdr_gfs3_lookup_req) at client.c:243
#6  0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
this=0x7f3b7c005ef0, data=0x7f3b8212f660)

at client-rpc-fops.c:3119


(gdb) p priv-lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
1, __kind = 0, __spins = 0, __list = {

  __prev = 0x0, __next = 0x0}},
  __size = \002\000\000\000\000\000\000\000\201f\000\000\001, '\000' 
repeats 26 times, __align = 2}



*(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
0x003daa4df343 in poll () from /lib64/libc.so.6

(gdb) bt
#0  0x003daa4df343 in poll () from /lib64/libc.so.6
#1  0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
buf=0x7f3b7c051264, len=4, func=0x3db2441570 SSL_read)

at socket.c:216
#2  0x7f3b8aa7277b in __socket_ssl_readv (this=value optimized 
out, opvector=value optimized out,

opcount=value optimized out) at socket.c:335
#3  0x7f3b8aa72c26 in __socket_cached_read (this=value optimized 
out, vector=value optimized out,
count

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-21 Thread Vijaikumar M
KP, Atin and myself did some debugging and found that there was a 
deadlock in glusterd.


When creating a volume snapshot, the back-end operation 'taking a 
lvm_snapshot and starting brick' for the each brick

are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock 
again.
This caused a deadlock in some race condition where main-thread waiting 
for one of the synctask thread to finish and

synctask-thread waiting for the big_lock.


We are working on fixing this issue.

Thanks,
Vijay


On Wednesday 21 May 2014 12:23 PM, Vijaikumar M wrote:
From the log: 
http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a17%3a10%3a51.tgzit 
looks like glusterd was hung:


*Glusterd log:**
* 5305 [2014-05-20 20:08:55.040665] E 
[glusterd-snapshot.c:3805:glusterd_add_brick_to_snap_volume] 
0-management: Unable to fetch snap device (vol1.brick_snapdevice0). 
Leaving empty
 5306 [2014-05-20 20:08:55.649146] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5307 [2014-05-20 20:08:55.663181] I 
[rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting 
frame-timeout to 600
 5308 [2014-05-20 20:16:55.541197] W 
[glusterfsd.c:1182:cleanup_and_exit] (-- 0-: received signum (15), 
shutting down


Glusterd was hung when executing the testcase ./tests/bugs/bug-1090042.t.

*Cli log:**
*72649 [2014-05-20 20:12:51.960765] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72650 [2014-05-20 20:12:51.960850] T [socket.c:2689:socket_connect] 
(--/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(--/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
-/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72651 [2014-05-20 20:12:52.960943] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72652 [2014-05-20 20:12:52.960999] T [socket.c:2697:socket_connect] 
0-glusterfs: connecting 0x1e0fcc0, state=0 gen=0 sock=-1
 72653 [2014-05-20 20:12:52.961038] W [dict.c:1059:data_to_str] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0xf1) 
[0x7ff8ad9ec7d0]))) 0-dict: data is NULL
 72654 [2014-05-20 20:12:52.961070] W [dict.c:1059:data_to_str] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(+0xb5f3) 
[0x7ff8ad9e95f3] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(socket_clien 
t_get_remote_sockaddr+0x10a) [0x7ff8ad9ed568] 
(--/build/install/lib/glusterfs/3.5qa2/rpc-transport/socket.so(client_fill_address_family+0x100) 
[0x7ff8ad9ec7df]))) 0-dict: data is NULL
 72655 [2014-05-20 20:12:52.961079] E 
[name.c:140:client_fill_address_family] 0-glusterfs: 
transport.address-family not specified. Could not guess default value 
from (remote-host:(null) or transport.unix.connect-path:(null)) 
optio   ns
 72656 [2014-05-20 20:12:54.961273] T 
[rpc-clnt.c:418:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
 72657 [2014-05-20 20:12:54.961404] T [socket.c:2689:socket_connect] 
(--/build/install/lib/libglusterfs.so.0(gf_timer_proc+0x1a2) 
[0x7ff8b6609994] 
(--/build/install/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x137) 
[0x7ff8b5d3305b] (- 
-/build/install/lib/libgfrpc.so.0(rpc_transport_connect+0x74) 
[0x7ff8b5d30071]))) 0-glusterfs: connect () called on transport 
already connected
 72658 [2014-05-20 20:12:55.120645] D [cli-cmd.c:384:cli_cmd_submit] 
0-cli: Returning 110
 72659 [2014-05-20 20:12:55.120723] D 
[cli-rpc-ops.c:8716:gf_cli_snapshot] 0-cli: Returning 110



Now we need to find why glusterd was hung.


Thanks,
Vijay



On Wednesday 21 May 2014 06:46 AM, Pranith Kumar Karampuri wrote:

Hey,
 Seems like even after this fix is merged, the regression tests are failing 
for the same script. You can check the logs 
athttp://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz

Relevant logs:
[2014-05-20 20:17:07.026045]  : volume create patchy 
build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : 
SUCCESS
[2014-05-20 20:17:08.030673]  : volume start patchy : SUCCESS
[2014-05-20 20:17:08.279148]  : volume barrier patchy enable : SUCCESS
[2014-05-20 20:17:08.476785]  : volume barrier patchy enable : FAILED : Failed 
to reconfigure barrier.
[2014-05-20 20:17:08.727429]  : volume barrier patchy disable : SUCCESS
[2014-05-20 20:17:08.926995]  : volume barrier patchy disable : FAILED : Failed 
to reconfigure barrier.

Pranith

- Original Message -

From: Pranith Kumar Karampuripkara...@redhat.com
To: Gluster Develgluster-devel@gluster.org
Cc: Joseph Fernandesjosfe...@redhat.com, Vijaikumar Mvmall

Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-19 Thread Vijaikumar M

Hi Joseph,

In the log mentioned below, it say ping-time is set to default value 
30sec.I think issue is different.
Can you please point me to the logs where you where able to re-create 
the problem.


Thanks,
Vijay



On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote:

hi Vijai, Joseph,
 In 2 of the last 3 build failures, 
http://build.gluster.org/job/regression/4479/console, 
http://build.gluster.org/job/regression/4478/console this 
test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert 
this test until the fix is available? Please send a patch to revert the test 
case if you guys feel so. You can re-submit it along with the fix to the bug 
mentioned by Joseph.

Pranith.

- Original Message -

From: Joseph Fernandes josfe...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Friday, 16 May, 2014 5:13:57 PM
Subject: Re: Spurious failures because of nfs and snapshots


Hi All,

tests/bugs/bug-1090042.t :

I was able to reproduce the issue i.e when this test is done in a loop

for i in {1..135} ; do  ./bugs/bug-1090042.t

When checked the logs
[2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs
[2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init]
0-management: defaulting ping-timeout to 30secs

The issue is with ping-timeout and is tracked under the bug

https://bugzilla.redhat.com/show_bug.cgi?id=1096729


The workaround is mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8


Regards,
Joe

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Cc: Joseph Fernandes josfe...@redhat.com
Sent: Friday, May 16, 2014 6:19:54 AM
Subject: Spurious failures because of nfs and snapshots

hi,
 In the latest build I fired for review.gluster.com/7766
 (http://build.gluster.org/job/regression/4443/console) failed because of
 spurious failure. The script doesn't wait for nfs export to be
 available. I fixed that, but interestingly I found quite a few scripts
 with same problem. Some of the scripts are relying on 'sleep 5' which
 also could lead to spurious failures if the export is not available in 5
 seconds. We found that waiting for 20 seconds is better, but 'sleep 20'
 would unnecessarily delay the build execution. So if you guys are going
 to write any scripts which has to do nfs mounts, please do it the
 following way:

EXPECT_WITHIN 20 1 is_nfs_export_available;
TEST mount -t nfs -o vers=3 $H0:/$V0 $N0;

Please review http://review.gluster.com/7773 :-)

I saw one more spurious failure in a snapshot related script
tests/bugs/bug-1090042.t on the next build fired by Niels.
Joesph (CCed) is debugging it. He agreed to reply what he finds and share it
with us so that we won't introduce similar bugs in future.

I encourage you guys to share what you fix to prevent spurious failures in
future.

Thanks
Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel