Re: [Gluster-devel] Progress about small files performance

2016-10-27 Thread Poornima Gurusiddaiah
Hi, 

There has been some work that is being done on improving the small file 
performance. 
Few of the many steps were md-cache enhancements, compound fops implementation. 
Both these will be available with 3.9 release. There are many more improvements 
planned [1]. 


[1] https://www.youtube.com/watch?v=CkScAjL1GEk 

Regards, 
Poornima 

- Original Message -


From: "Gandalf Corvotempesta"  
To: "Gluster Devel"  
Sent: Wednesday, October 26, 2016 2:23:08 AM 
Subject: [Gluster-devel] Progress about small files performance 



Any progress about the major issue with gluster: the small files performance? 

Anyone working on this? 

I would really like to use gluster as storage for maildirs or web hosting, but 
with the current performance this wouldn't be possible without adding 
additional layers (like exporting huge files with iscsi or creating an NFS VM 
on top of gluster) 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-devel 



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Test Thursday - Release 3.9

2016-10-27 Thread Shyam
Tested the gbench script across 3 volume configurations, dist, 
dist-replicate, dist-ec. All passed. There was no -ve testing that I 
did, so nothing to report on that front.


This test is not on any list, just an additional test that I was able to 
run in my day. It covers smallfile metadata tests and IOZone to a point.


One thing I noticed is that 'glusterd --version' gives the copyright 
till 2013, I would assume we need this changed to 2016, right?


See: https://paste.fedoraproject.org/461986/90184147/

On 10/26/2016 10:34 AM, Aravinda wrote:

Gluster 3.9.0rc2 tarball is available here
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.9.0rc2.tar.gz

regards
Aravinda

On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:

Hi,

Since Automated test framework for Gluster is in progress, we need
help from Maintainers and developers to test the features and bug
fixes to release Gluster 3.9.

In last maintainers meeting Shyam shared an idea about having a Test
day to accelerate the testing and release.

Please participate in testing your component(s) on Oct 27, 2016. We
will prepare the rc2 build by tomorrow and share the details before
Test day.

RC1 Link:
http://www.gluster.org/pipermail/maintainers/2016-September/001442.html
Release Checklist:
https://public.pad.fsfe.org/p/gluster-component-release-checklist


Thanks and Regards
Aravinda and Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Gluster Test Thursday - Release 3.9

2016-10-27 Thread Kaleb S. KEITHLEY


Ack on nfs-ganesha bits. Tentative ack on gnfs bits.

Conditional ack on build, see:
  http://review.gluster.org/15726
  http://review.gluster.org/15733
  http://review.gluster.org/15737
  http://review.gluster.org/15743

There will be backports to 3.9 of the last three soon. Timely reviews of 
the last three will accelerate the availability of backports.


On 10/26/2016 10:34 AM, Aravinda wrote:

Gluster 3.9.0rc2 tarball is available here
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.9.0rc2.tar.gz

regards
Aravinda

On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:

Hi,

Since Automated test framework for Gluster is in progress, we need
help from Maintainers and developers to test the features and bug
fixes to release Gluster 3.9.

In last maintainers meeting Shyam shared an idea about having a Test
day to accelerate the testing and release.

Please participate in testing your component(s) on Oct 27, 2016. We
will prepare the rc2 build by tomorrow and share the details before
Test day.

RC1 Link:
http://www.gluster.org/pipermail/maintainers/2016-September/001442.html
Release Checklist:
https://public.pad.fsfe.org/p/gluster-component-release-checklist


Thanks and Regards
Aravinda and Pranith



___
maintainers mailing list
maintain...@gluster.org
http://www.gluster.org/mailman/listinfo/maintainers


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Gluster Test Thursday - Release 3.9

2016-10-27 Thread Aravinda
Ack for Geo-replication and Events API features. No regressions found 
during testing, and verified all the bug fixes made for Release-3.9.


regards
Aravinda

On Wednesday 26 October 2016 08:04 PM, Aravinda wrote:

Gluster 3.9.0rc2 tarball is available here
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.9.0rc2.tar.gz 



regards
Aravinda

On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:

Hi,

Since Automated test framework for Gluster is in progress, we need 
help from Maintainers and developers to test the features and bug 
fixes to release Gluster 3.9.


In last maintainers meeting Shyam shared an idea about having a Test 
day to accelerate the testing and release.


Please participate in testing your component(s) on Oct 27, 2016. We 
will prepare the rc2 build by tomorrow and share the details before 
Test day.


RC1 Link: 
http://www.gluster.org/pipermail/maintainers/2016-September/001442.html
Release Checklist: 
https://public.pad.fsfe.org/p/gluster-component-release-checklist



Thanks and Regards
Aravinda and Pranith



___
maintainers mailing list
maintain...@gluster.org
http://www.gluster.org/mailman/listinfo/maintainers


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Input/output error when files in .shard folder are deleted

2016-10-27 Thread Krutika Dhananjay
This should work without any issues. It is possible that the shard(s)
would get created with different gfids but the ones on the lagging brick
will eventually (by the time heal-info returns all zeroes) get replaced
with shards having the correct gfids.

Have you tried it yet? Did you face any issues?

-Krutika

On Thu, Oct 27, 2016 at 3:48 PM, qingwei wei  wrote:

> Hi,
>
> My final goal of the test is to see the impact of brick replacement
> while IO is till running.
>
> One scenario that i think of is as below:
>
> 1. random read IO is performed on gluster volume (3 replicas)
> 2. 1 brick down and IO still ongoing
> 3. Perform brick replacement and IO still ongoing
> 4. There will be a full heal on the new brick while IO still ongoing.
>
> Assume we have large number of files, reconstruct (recreate shard
> file) could take some time on this new brick. So will it be possible
> that some random read IO to yet created shard trigger the similar
> error?
>
> Thanks.
>
> Cwtan
>
> On Thu, Oct 27, 2016 at 4:26 PM, Krutika Dhananjay 
> wrote:
> > Found the RC. The problem seems to be that sharding translator attempts
> to
> > create
> > non-existent shards in read/write codepaths with a newly generated gfid
> > attached
> > to the create request in case the shard is absent. Replicate translator,
> > which sits below
> > sharding on the stack takes this request and plays it on all of its
> > replicas. On two of them it
> > fails with EEXIST, and on the one where the shards were removed from the
> > backend, the
> > shard path is created but with the newly generated gfid while the other
> two
> > replicas continue to
> > hold the original gfid (the one prior to rm -rf). Although this can be
> > fixed, it will require one
> > additional lookup for each shard for each read/write operation, causing
> the
> > latency of the read/write
> > response to the application to increase by a factor of 1 network call.
> >
> > The test you're doing is partially (but not fully) manipulating and
> removing
> > data from the backend,
> > which is not recommended.
> >
> > My question to you is this - what is the specific failure that you are
> > trying to simulate with removal of
> > contents of .shard? Normally, the `rm -rf on backend` type of tests are
> > performed to simulate disk
> > failure and its replacement with a brand new disk, in which case
> executing
> > the replace-brick/reset-brick
> > commands should be sufficient to recover all contents from the remaining
> two
> > replicas.
> >
> > -Krutika
> >
> > On Thu, Oct 27, 2016 at 12:49 PM, Krutika Dhananjay  >
> > wrote:
> >>
> >> Now it's reproducible, thanks. :)
> >>
> >> I think I know the RC. Let me confirm it through tests and report back.
> >>
> >> -Krutika
> >>
> >> On Thu, Oct 27, 2016 at 10:42 AM, qingwei wei 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I did few more test runs and it seems that it happens during this
> >>> sequence
> >>>
> >>> 1.populate data using dd
> >>> 2. delete away ALL the shard files in one of the brick .shard folder
> >>> 3. Trying to access using dd, no error reported
> >>> 4. umount and mount.
> >>> 5. Trying to access using dd, no error reported
> >>> 6. umount and mount.
> >>> 7. Trying to access using dd and Input/Output error reported
> >>>
> >>> during step 3 and 4, no file is created under the .shard directory
> >>> For step 7, a shard file is created with same file name but different
> >>> gfid compare to other good replicas.
> >>>
> >>> Below is the client log and brick log with more details in the attached
> >>> log.
> >>>
> >>> Client log
> >>>
> >>> [2016-10-27 04:34:46.493281] D [MSGID: 0]
> >>> [shard.c:3138:shard_common_mknod_cbk] 0-testHeal4-shard: mknod of
> >>> shard 1 failed: File exists
> >>> [2016-10-27 04:34:46.493351] D [MSGID: 0]
> >>> [dht-common.c:2633:dht_lookup] 0-testHeal4-dht: Calling fresh lookup
> >>> for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 on
> >>> testHeal4-replicate-0
> >>> [2016-10-27 04:34:46.494646] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-0:
> >>> remote operation failed. Path: (null)
> >>> (----) [Invalid argument]
> >>> [2016-10-27 04:34:46.494673] D [MSGID: 0]
> >>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
> >>> stack-address: 0x7f9083edc1c8, testHeal4-client-0 returned -1 error:
> >>> Invalid argument [Invalid argument]
> >>> [2016-10-27 04:34:46.494705] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-1:
> >>> remote operation failed. Path: (null)
> >>> (----) [Invalid argument]
> >>> [2016-10-27 04:34:46.494710] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-2:
> >>> remote operation failed. Path: (null)
> >>> (----) [Invalid argument]
> >>> [2016-10-27 

Re: [Gluster-devel] r.g.o seems to be down?

2016-10-27 Thread Atin Mukherjee
BZ filed :https://bugzilla.redhat.com/show_bug.cgi?id=1389282

2016-10-27 16:46 GMT+05:30 Atin Mukherjee :

>
>
> --
>
> ~ Atin (atinm)
>



-- 

~ Atin (atinm)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Input/output error when files in .shard folder are deleted

2016-10-27 Thread qingwei wei
Hi,

My final goal of the test is to see the impact of brick replacement
while IO is till running.

One scenario that i think of is as below:

1. random read IO is performed on gluster volume (3 replicas)
2. 1 brick down and IO still ongoing
3. Perform brick replacement and IO still ongoing
4. There will be a full heal on the new brick while IO still ongoing.

Assume we have large number of files, reconstruct (recreate shard
file) could take some time on this new brick. So will it be possible
that some random read IO to yet created shard trigger the similar
error?

Thanks.

Cwtan

On Thu, Oct 27, 2016 at 4:26 PM, Krutika Dhananjay  wrote:
> Found the RC. The problem seems to be that sharding translator attempts to
> create
> non-existent shards in read/write codepaths with a newly generated gfid
> attached
> to the create request in case the shard is absent. Replicate translator,
> which sits below
> sharding on the stack takes this request and plays it on all of its
> replicas. On two of them it
> fails with EEXIST, and on the one where the shards were removed from the
> backend, the
> shard path is created but with the newly generated gfid while the other two
> replicas continue to
> hold the original gfid (the one prior to rm -rf). Although this can be
> fixed, it will require one
> additional lookup for each shard for each read/write operation, causing the
> latency of the read/write
> response to the application to increase by a factor of 1 network call.
>
> The test you're doing is partially (but not fully) manipulating and removing
> data from the backend,
> which is not recommended.
>
> My question to you is this - what is the specific failure that you are
> trying to simulate with removal of
> contents of .shard? Normally, the `rm -rf on backend` type of tests are
> performed to simulate disk
> failure and its replacement with a brand new disk, in which case executing
> the replace-brick/reset-brick
> commands should be sufficient to recover all contents from the remaining two
> replicas.
>
> -Krutika
>
> On Thu, Oct 27, 2016 at 12:49 PM, Krutika Dhananjay 
> wrote:
>>
>> Now it's reproducible, thanks. :)
>>
>> I think I know the RC. Let me confirm it through tests and report back.
>>
>> -Krutika
>>
>> On Thu, Oct 27, 2016 at 10:42 AM, qingwei wei  wrote:
>>>
>>> Hi,
>>>
>>> I did few more test runs and it seems that it happens during this
>>> sequence
>>>
>>> 1.populate data using dd
>>> 2. delete away ALL the shard files in one of the brick .shard folder
>>> 3. Trying to access using dd, no error reported
>>> 4. umount and mount.
>>> 5. Trying to access using dd, no error reported
>>> 6. umount and mount.
>>> 7. Trying to access using dd and Input/Output error reported
>>>
>>> during step 3 and 4, no file is created under the .shard directory
>>> For step 7, a shard file is created with same file name but different
>>> gfid compare to other good replicas.
>>>
>>> Below is the client log and brick log with more details in the attached
>>> log.
>>>
>>> Client log
>>>
>>> [2016-10-27 04:34:46.493281] D [MSGID: 0]
>>> [shard.c:3138:shard_common_mknod_cbk] 0-testHeal4-shard: mknod of
>>> shard 1 failed: File exists
>>> [2016-10-27 04:34:46.493351] D [MSGID: 0]
>>> [dht-common.c:2633:dht_lookup] 0-testHeal4-dht: Calling fresh lookup
>>> for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 on
>>> testHeal4-replicate-0
>>> [2016-10-27 04:34:46.494646] W [MSGID: 114031]
>>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-0:
>>> remote operation failed. Path: (null)
>>> (----) [Invalid argument]
>>> [2016-10-27 04:34:46.494673] D [MSGID: 0]
>>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>>> stack-address: 0x7f9083edc1c8, testHeal4-client-0 returned -1 error:
>>> Invalid argument [Invalid argument]
>>> [2016-10-27 04:34:46.494705] W [MSGID: 114031]
>>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-1:
>>> remote operation failed. Path: (null)
>>> (----) [Invalid argument]
>>> [2016-10-27 04:34:46.494710] W [MSGID: 114031]
>>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-2:
>>> remote operation failed. Path: (null)
>>> (----) [Invalid argument]
>>> [2016-10-27 04:34:46.494730] D [MSGID: 0]
>>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>>> stack-address: 0x7f9083edc1c8, testHeal4-client-1 returned -1 error:
>>> Invalid argument [Invalid argument]
>>> [2016-10-27 04:34:46.494751] D [MSGID: 0]
>>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>>> stack-address: 0x7f9083edc1c8, testHeal4-client-2 returned -1 error:
>>> Invalid argument [Invalid argument]
>>> [2016-10-27 04:34:46.495339] D [MSGID: 0]
>>> [afr-common.c:1986:afr_lookup_done] 0-stack-trace: stack-address:
>>> 0x7f9083edbb1c, testHeal4-replicate-0 returned -1 error: Input/output
>>> error 

Re: [Gluster-devel] Gluster-devel Digest, Vol 31, Issue 61

2016-10-27 Thread Mohit Agrawal
Hi,

I have done some basic testing specific to SSL component on tar(
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.9.0rc2.tar.gz
).

1) After enable SSL(io and mgmt encryption ) mount is working(for
distributed/replicated) and able to transfer data on volume.
2) reconnection is working after disconnect.

Regards
Mohit Agrawal




On Thu, Oct 27, 2016 at 8:30 AM,  wrote:

> Send Gluster-devel mailing list submissions to
> gluster-devel@gluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.gluster.org/mailman/listinfo/gluster-devel
> or, via email, send a message with subject or body 'help' to
> gluster-devel-requ...@gluster.org
>
> You can reach the person managing the list at
> gluster-devel-ow...@gluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Gluster-devel digest..."
>
>
> Today's Topics:
>
>1. Memory management and friends (Oleksandr Natalenko)
>2. Re: Gluster Test Thursday - Release 3.9 (Aravinda)
>3. Re: Multiplexing status, October 26 (Jeff Darcy)
>4. Re: [Gluster-Maintainers] Gluster Test Thursday - Release 3.9
>   (Niels de Vos)
>5. Re: [Gluster-Maintainers] glusterfs-3.9.0rc2  released
>   (Kaleb S. KEITHLEY)
>6. automating straightforward backports (Pranith Kumar Karampuri)
>
>
> --
>
> Message: 1
> Date: Wed, 26 Oct 2016 16:27:40 +0200
> From: Oleksandr Natalenko 
> To: Gluster Devel 
> Subject: [Gluster-devel] Memory management and friends
> Message-ID: <1c49cb1391aaeda5b3fb129ccbf66...@natalenko.name>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hello.
>
> As a result of today's community meeting I start dedicated ML thread for
> gathering memory management issues together to make it possible to
> summarize them and construct some plan what to do next.
>
> Very important notice: I'm not the active GlusterFS developer, but
> gained excessive experience with GlusterFS in the past at previous work,
> and the main issue that was chasing me all the time was memory leaking.
> Consider this as a request for action from GlusterFS customer,
> apparently approved by Kaushal and Amye during last meeting :).
>
> So, here go key points.
>
> 1) Almost all nasty and obvious memory leaks have been successfully
> fixed during the last year, and that allowed me to run GlusterFS in
> production at previous work for almost all types of workload except one
> ? dovecot mail storage. The specific of this workload is that it
> involved huge amount of files, and I assume this to be kinda of edge
> case unhiding some dark corners of GlusterFS memory management. I was
> able to provide Nithya with Valgrind+Massif memory profiling results and
> test case, and that helped her to prepare at least 1 extra fix (and more
> to come AFAIK), which has some deal with readdirp-related code.
> Nevertheless, it is reported that this is not the major source of
> leaking. Nithya suspect that memory gets fragmented heavily due to lots
> of small allocations, and memory pools cannot cope with this kind of
> fragmentation under constant load.
>
> Related BZs:
>
>* https://bugzilla.redhat.com/show_bug.cgi?id=1369364
>* https://bugzilla.redhat.com/show_bug.cgi?id=1380249
>
> People involved:
>
>* nbalacha, could you please provide more info on your findings?
>
> 2) Meanwhile, Jeff goes on with brick multiplexing feature, facing some
> issue with memory management too and blaming memory pools for that.
>
> Related ML email:
>
>*
> http://www.gluster.org/pipermail/gluster-devel/2016-October/051118.html
>*
> http://www.gluster.org/pipermail/gluster-devel/2016-October/051160.html
>
> People involved:
>
>* jdarcy, have you discussed this outside of ML? It seems your email
> didn't get proper attention.
>
> 3) We had brief discussion with obnox and anoopcs on #gluster-meeting
> and #gluster-dev regarding jemalloc and talloc. obnox believes that we
> may use both, jemalloc for substituting malloc/free, talloc for
> rewriting memory management for GlusterFS properly.
>
> Related logs:
>
>*
> https://botbot.me/freenode/gluster-dev/2016-10-26/?msg=75501394=2
>
> People involved:
>
>* obnox, could you share your ideas on this?
>
> To summarize:
>
> 1) we need key devs involved in memory management to share their ideas;
> 2) using production-proven memory allocators and memory pools
> implementation is desired;
> 3) someone should manage the workflow of reconstructing memory
> management.
>
> Feel free to add anything I've missed.
>
> Regards,
>Oleksandr
>
>
> --
>
> Message: 2
> Date: Wed, 26 Oct 2016 20:04:54 +0530
> From: Aravinda 
> To: Gluster Devel ,  GlusterFS Maintainers
> 

Re: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

2016-10-27 Thread Lian, George (Nokia - CN/Hangzhou)
Hi, Raghavendra,

Could you please give some suggestion for this issue? we try to find the clue 
for this issue for a long time, but it has no progress:(

Thanks & Best Regards,
George

-Original Message-
From: Lian, George (Nokia - CN/Hangzhou) 
Sent: Wednesday, October 19, 2016 4:40 PM
To: 'Raghavendra Gowdappa' 
Cc: Gluster-devel@gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS 
; Zhang, Bingxuan (Nokia - 
CN/Hangzhou) ; Zizka, Jan (Nokia - CZ/Prague) 

Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the 
really size of the syslog file

Hi, Raghavendra

Just now, we test it with glusterfs log with debug-level "TRACE", and let some 
application trigger "glusterfs" produce large log, in that case, when we set 
write-behind and stat-prefetch both OFF,
Tail the glusterfs log such like mnt-{VOLUME-NAME}.log, it still failed with 
"file truncated",

So that means if file's IO in huge amount, the issue will still be there even 
write-behind and stat-prefetch both OFF.

Best Regards,
George

-Original Message-
From: Raghavendra Gowdappa [mailto:rgowd...@redhat.com] 
Sent: Wednesday, October 19, 2016 2:54 PM
To: Lian, George (Nokia - CN/Hangzhou) 
Cc: Gluster-devel@gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS 
; Zhang, Bingxuan (Nokia - 
CN/Hangzhou) ; Zizka, Jan (Nokia - CZ/Prague) 

Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the 
really size of the syslog file



- Original Message -
> From: "George Lian (Nokia - CN/Hangzhou)" 
> To: "Raghavendra Gowdappa" 
> Cc: Gluster-devel@gluster.org, "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> , "Bingxuan Zhang (Nokia - 
> CN/Hangzhou)"
> , "Jan Zizka (Nokia - CZ/Prague)" 
> 
> Sent: Wednesday, October 19, 2016 12:05:01 PM
> Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the 
> really size of the syslog file
> 
> Hi, Raghavendra,
> 
> Thanks a lots for your quickly update!
> In my case, there are so many process(write) is writing to the syslog file,
> it do involve the writer is in the same host and writing in same mount point
> while the tail(reader) is reading it.
> 
> The bug I just guess is:
> When a writer write the data with write-behind, it call the call-back
> function " mdc_writev_cbk" and called "mdc_inode_iatt_set_validate" to
> validate the "iatt" data, but with the code I mentioned last mail, it do
> nothing.

mdc_inode_iatt_set_validate has following code


if (!iatt || !iatt->ia_ctime) {
mdc->ia_time = 0;
goto unlock;
}


Which means a NULL iatt sets mdc->ia_time to 0. This results in subsequent 
lookup/stat calls to be NOT served from md-cache. Instead, the stat is served 
from backend bricks. So, I don't see an issue here.

However, one case where a NULL iatt is different from a valid iatt (which 
differs from the value stored in md-cache) is that the latter results in a call 
to inode_invalidate. This invalidation propagates to kernel and all dentry and 
page cache corresponding to file is purged. So, I am suspecting whether the 
stale stat you saw was served from kernel cache (not from glusterfs). If this 
is the case, having mount options "attribute-timeout=0" and "entry-timeout=0" 
should've helped.

I am still at loss to point out the RCA for this issue.


> And in same time, the reader(tail) read the "iatt" data, but in case of the
> cache-time is not timeout, it will return the "iatt" data without the last
> change.
> 
> Do your think it is a possible bug?
> 
> Thanks & Best Regards,
> George
> 
> -Original Message-
> From: Raghavendra Gowdappa [mailto:rgowd...@redhat.com]
> Sent: Wednesday, October 19, 2016 2:06 PM
> To: Lian, George (Nokia - CN/Hangzhou) 
> Cc: Gluster-devel@gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> ; Zhang, Bingxuan (Nokia -
> CN/Hangzhou) ; Zizka, Jan (Nokia - CZ/Prague)
> 
> Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the
> really size of the syslog file
> 
> 
> 
> - Original Message -
> > From: "George Lian (Nokia - CN/Hangzhou)" 
> > To: "Raghavendra Gowdappa" 
> > Cc: Gluster-devel@gluster.org, "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > , "Bingxuan Zhang (Nokia
> > - CN/Hangzhou)"
> > , "Jan Zizka (Nokia - CZ/Prague)"
> > 
> > Sent: Wednesday, October 19, 2016 

[Gluster-devel] Volgen-2.0 for GD2

2016-10-27 Thread Kaushal M
* I've also posted this to my blog at https://kshlm.in/post/volgen-2.0/ *

# Designing and prototyping volgen for GD2

I've recently begun working on volgen for GD2. This post gives an
overview of what I've been doing till now.

## Some background first

One of the requirements for GD2 is to make it easier to modify the
volume graphs, in particular make it easier to add new translators
into the volume graph.

GlusterD right now has hardcoded graph topologies in its code. This
makes it very hard to begin modify graphs. This made it hard for
features like tiering to do their stuff.

For GD2, we are designing a more flexible volgen package, that makes
it much easier to modify, extend and combine graphs. We've begun
working on a prototype to explore how this design would work in real
life.

Different approaches to volgen was discussed earlier this year in a
meeting during DevConf. The approaches discussed are summarized in the
["Flexible volgen" wiki][1].  Of the discussed approaches, the
"SystemD units" style approach was
picked as the most suitable.

I've begun a prototype implementation of this approach at
[kshlm/glusterd2-volgen][2]. I welcome comments and queries on the
approach as well as the prototype.

## The SystemD style volgen

> NOTE: Most of the names, paths etc. used below are just placeholders. They 
> will all be changed to more decsriptive terms before we reach the end.

More information on this can be found on the wiki, but the wiki might
now be always up to date. So I'm providing a quick summary of the
approach here.

This approach to volgen makes use of what I'm currently calling
Nodefiles. These are analogus to the systemd unit files. A Nodefile
describes a node which can appear in a volume graph.
There are two types of nodes/nodefiles, Xlators and Targets

### Xlators

Xlators nodefiles describe GlusterFS translators. Nodefiles are text
files, which have to follow a defined format, that describe the
particular translator. Examples of Nodefiles can be found at
[kshlm/glusterd2-volgen/examples/xlator][3
]. _Note: These examples are just that; examples. They are not the final form._

In addition to describing a translator, Nodefiles specify the
dependencies of the translator on other translator. Some of the
dependecies currently available are,
- Requires, this translator requires the specified translator also to be enabled
- Conflicts, this translator cannot work with the specified translator
- Before, this translator must appear before the other translator in
the volume graph
- After, this translator must appear after the specified translator in
the volume graph
- Parent, this translator can only have the specified translator as its parent
- Child, this translator can only have the specified translator as its
child(ren).

In the prototype, [TOML][4] is used as the description language. The
Nodefile is described in code as a the `Node` struct in
[volgen/nodefile.go][5]. _Note: This is incomplete right now._

Each translator will provide its own nodefile, all of which will need
to be added to a directory known to GD2 (let's call it `XLATORDIR` for
now), for eg. `/usr/lib/glusterd/xlators`.
To make it easier for authors to create these files, we may provide
tools to scrape information from the C code and build the file. But
that's a task for later.

GD2 will read the Xlator nodefiles from `XLATORDIR` at startup and
build an internal translator map, and a transaltor options table.

### Targets

Targets are nodes which describe graphs. You can have Targets for FUSE
graph, brick graph, self-heal daemon etc.

On disc, Targets are directories which have Xlators linked into it.
The linked in translators are the translators which will be present in
the graph. The Targets decribe themselves with an `info` nodefile in
the directory. Specific depe
ndencies for the target can also be given in the nodefile. An example
target for the brick graph can be found at
[kshlm/glusterd2-volgen/examples/brick.target][6]. A `Target` struct
is described in [volgen/target.go][7].

Targets can include other targets among themselves. This will make it
easier to describe complex graphs. For eg. a `fuse.target` for the
FUSE graph can include a `performance.target `and `clients.target`
among other translators. The `pe
rformance.target` includes all known performance translators, and
would provide an easy way to enable/disable performance translators.
The `clients.target` would just include the cluster translators, ie
the part of the client graph belo
w and including DHT. This will make it easier to generate other
graphs, like NFS for example, which would now need to only need to
include the `clients.target` instead of the whole client graph.

Targets are the starting points of building an actual volume graph.
Once a target is loaded into GD2, building a graph will be as simple
as calling `Target.BuildGraph(volinfo)`.

The `BuildGraph` function will first do a dependency resolution of all
the translators/targets included in 

Re: [Gluster-devel] Input/output error when files in .shard folder are deleted

2016-10-27 Thread Krutika Dhananjay
Found the RC. The problem seems to be that sharding translator attempts to
create
non-existent shards in read/write codepaths with a newly generated gfid
attached
to the create request in case the shard is absent. Replicate translator,
which sits below
sharding on the stack takes this request and plays it on all of its
replicas. On two of them it
fails with EEXIST, and on the one where the shards were removed from the
backend, the
shard path is created but with the newly generated gfid while the other two
replicas continue to
hold the original gfid (the one prior to rm -rf). Although this can be
fixed, it will require one
additional lookup for each shard for each read/write operation, causing the
latency of the read/write
response to the application to increase by a factor of 1 network call.

The test you're doing is partially (but not fully) manipulating and
removing data from the backend,
which is not recommended.

My question to you is this - what is the specific failure that you are
trying to simulate with removal of
contents of .shard? Normally, the `rm -rf on backend` type of tests are
performed to simulate disk
failure and its replacement with a brand new disk, in which case executing
the replace-brick/reset-brick
commands should be sufficient to recover all contents from the remaining
two replicas.

-Krutika

On Thu, Oct 27, 2016 at 12:49 PM, Krutika Dhananjay 
wrote:

> Now it's reproducible, thanks. :)
>
> I think I know the RC. Let me confirm it through tests and report back.
>
> -Krutika
>
> On Thu, Oct 27, 2016 at 10:42 AM, qingwei wei  wrote:
>
>> Hi,
>>
>> I did few more test runs and it seems that it happens during this sequence
>>
>> 1.populate data using dd
>> 2. delete away ALL the shard files in one of the brick .shard folder
>> 3. Trying to access using dd, no error reported
>> 4. umount and mount.
>> 5. Trying to access using dd, no error reported
>> 6. umount and mount.
>> 7. Trying to access using dd and Input/Output error reported
>>
>> during step 3 and 4, no file is created under the .shard directory
>> For step 7, a shard file is created with same file name but different
>> gfid compare to other good replicas.
>>
>> Below is the client log and brick log with more details in the attached
>> log.
>>
>> Client log
>>
>> [2016-10-27 04:34:46.493281] D [MSGID: 0]
>> [shard.c:3138:shard_common_mknod_cbk] 0-testHeal4-shard: mknod of
>> shard 1 failed: File exists
>> [2016-10-27 04:34:46.493351] D [MSGID: 0]
>> [dht-common.c:2633:dht_lookup] 0-testHeal4-dht: Calling fresh lookup
>> for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 on
>> testHeal4-replicate-0
>> [2016-10-27 04:34:46.494646] W [MSGID: 114031]
>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-0:
>> remote operation failed. Path: (null)
>> (----) [Invalid argument]
>> [2016-10-27 04:34:46.494673] D [MSGID: 0]
>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>> stack-address: 0x7f9083edc1c8, testHeal4-client-0 returned -1 error:
>> Invalid argument [Invalid argument]
>> [2016-10-27 04:34:46.494705] W [MSGID: 114031]
>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-1:
>> remote operation failed. Path: (null)
>> (----) [Invalid argument]
>> [2016-10-27 04:34:46.494710] W [MSGID: 114031]
>> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-2:
>> remote operation failed. Path: (null)
>> (----) [Invalid argument]
>> [2016-10-27 04:34:46.494730] D [MSGID: 0]
>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>> stack-address: 0x7f9083edc1c8, testHeal4-client-1 returned -1 error:
>> Invalid argument [Invalid argument]
>> [2016-10-27 04:34:46.494751] D [MSGID: 0]
>> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
>> stack-address: 0x7f9083edc1c8, testHeal4-client-2 returned -1 error:
>> Invalid argument [Invalid argument]
>> [2016-10-27 04:34:46.495339] D [MSGID: 0]
>> [afr-common.c:1986:afr_lookup_done] 0-stack-trace: stack-address:
>> 0x7f9083edbb1c, testHeal4-replicate-0 returned -1 error: Input/output
>> error [Input/output error]
>> [2016-10-27 04:34:46.495364] D [MSGID: 0]
>> [dht-common.c:2220:dht_lookup_cbk] 0-testHeal4-dht: fresh_lookup
>> returned for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 with
>> op_ret -1 [Input/output error]
>> [2016-10-27 04:34:46.495374] D [MSGID: 0]
>> [dht-common.c:2300:dht_lookup_cbk] 0-testHeal4-dht: Lookup of
>> /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 for subvolume
>> testHeal4-replicate-0 failed [Input/output error]
>> [2016-10-27 04:34:46.495384] D [MSGID: 0]
>> [dht-common.c:2363:dht_lookup_cbk] 0-stack-trace: stack-address:
>> 0x7f9083edbb1c, testHeal4-dht returned -1 error: Input/output error
>> [Input/output error]
>> [2016-10-27 04:34:46.495395] E [MSGID: 133010]
>> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal4-shard:
>> Lookup on 

Re: [Gluster-devel] Input/output error when files in .shard folder are deleted

2016-10-27 Thread Krutika Dhananjay
Now it's reproducible, thanks. :)

I think I know the RC. Let me confirm it through tests and report back.

-Krutika

On Thu, Oct 27, 2016 at 10:42 AM, qingwei wei  wrote:

> Hi,
>
> I did few more test runs and it seems that it happens during this sequence
>
> 1.populate data using dd
> 2. delete away ALL the shard files in one of the brick .shard folder
> 3. Trying to access using dd, no error reported
> 4. umount and mount.
> 5. Trying to access using dd, no error reported
> 6. umount and mount.
> 7. Trying to access using dd and Input/Output error reported
>
> during step 3 and 4, no file is created under the .shard directory
> For step 7, a shard file is created with same file name but different
> gfid compare to other good replicas.
>
> Below is the client log and brick log with more details in the attached
> log.
>
> Client log
>
> [2016-10-27 04:34:46.493281] D [MSGID: 0]
> [shard.c:3138:shard_common_mknod_cbk] 0-testHeal4-shard: mknod of
> shard 1 failed: File exists
> [2016-10-27 04:34:46.493351] D [MSGID: 0]
> [dht-common.c:2633:dht_lookup] 0-testHeal4-dht: Calling fresh lookup
> for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 on
> testHeal4-replicate-0
> [2016-10-27 04:34:46.494646] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-0:
> remote operation failed. Path: (null)
> (----) [Invalid argument]
> [2016-10-27 04:34:46.494673] D [MSGID: 0]
> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
> stack-address: 0x7f9083edc1c8, testHeal4-client-0 returned -1 error:
> Invalid argument [Invalid argument]
> [2016-10-27 04:34:46.494705] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-1:
> remote operation failed. Path: (null)
> (----) [Invalid argument]
> [2016-10-27 04:34:46.494710] W [MSGID: 114031]
> [client-rpc-fops.c:2981:client3_3_lookup_cbk] 0-testHeal4-client-2:
> remote operation failed. Path: (null)
> (----) [Invalid argument]
> [2016-10-27 04:34:46.494730] D [MSGID: 0]
> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
> stack-address: 0x7f9083edc1c8, testHeal4-client-1 returned -1 error:
> Invalid argument [Invalid argument]
> [2016-10-27 04:34:46.494751] D [MSGID: 0]
> [client-rpc-fops.c:2989:client3_3_lookup_cbk] 0-stack-trace:
> stack-address: 0x7f9083edc1c8, testHeal4-client-2 returned -1 error:
> Invalid argument [Invalid argument]
> [2016-10-27 04:34:46.495339] D [MSGID: 0]
> [afr-common.c:1986:afr_lookup_done] 0-stack-trace: stack-address:
> 0x7f9083edbb1c, testHeal4-replicate-0 returned -1 error: Input/output
> error [Input/output error]
> [2016-10-27 04:34:46.495364] D [MSGID: 0]
> [dht-common.c:2220:dht_lookup_cbk] 0-testHeal4-dht: fresh_lookup
> returned for /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 with
> op_ret -1 [Input/output error]
> [2016-10-27 04:34:46.495374] D [MSGID: 0]
> [dht-common.c:2300:dht_lookup_cbk] 0-testHeal4-dht: Lookup of
> /.shard/76bc4b0f-bb18-4736-8327-99098cd0d7ce.1 for subvolume
> testHeal4-replicate-0 failed [Input/output error]
> [2016-10-27 04:34:46.495384] D [MSGID: 0]
> [dht-common.c:2363:dht_lookup_cbk] 0-stack-trace: stack-address:
> 0x7f9083edbb1c, testHeal4-dht returned -1 error: Input/output error
> [Input/output error]
> [2016-10-27 04:34:46.495395] E [MSGID: 133010]
> [shard.c:1582:shard_common_lookup_shards_cbk] 0-testHeal4-shard:
> Lookup on shard 1 failed. Base file gfid =
> 76bc4b0f-bb18-4736-8327-99098cd0d7ce [Input/output error]
> [2016-10-27 04:34:46.495406] D [MSGID: 0]
> [shard.c:3086:shard_post_lookup_shards_readv_handler] 0-stack-trace:
> stack-address: 0x7f9083edbb1c, testHeal4-shard returned -1 error:
> Input/output error [Input/output error]
> [2016-10-27 04:34:46.495417] D [MSGID: 0]
> [defaults.c:1010:default_readv_cbk] 0-stack-trace: stack-address:
> 0x7f9083edbb1c, testHeal4-write-behind returned -1 error: Input/output
> error [Input/output error]
> [2016-10-27 04:34:46.495428] D [MSGID: 0]
> [read-ahead.c:462:ra_readv_disabled_cbk] 0-stack-trace: stack-address:
> 0x7f9083edbb1c, testHeal4-read-ahead returned -1 error: Input/output
> error [Input/output error]
>
> brick log
>
> [2016-10-27 04:34:46.492055] D [MSGID: 0]
> [io-threads.c:351:iot_schedule] 0-testHeal4-io-threads: STATFS
> scheduled as fast fop
> [2016-10-27 04:34:46.492157] D [client_t.c:333:gf_client_ref]
> (-->/usr/lib64/glusterfs/3.7.16/xlator/protocol/server.so(
> server3_3_entrylk+0x93)
> [0x7efebb37d633]
> -->/usr/lib64/glusterfs/3.7.16/xlator/protocol/server.so(
> get_frame_from_request+0x257)
> [0x7efebb36cfd7] -->/lib64/libglusterfs.so.0(gf_client_ref+0x68)
> [0x7efecfadf608] ) 0-client_t:
> fujitsu05.dctopenstack.org-6064-2016/10/27-04:34:44:
> 217958-testHeal4-client-1-0-0:
> ref-count 3
> [2016-10-27 04:34:46.492180] D [MSGID: 0]
> [io-threads.c:351:iot_schedule] 0-testHeal4-io-threads: ENTRYLK
> scheduled as normal fop
> 

Re: [Gluster-devel] Review request - change pid file location to /var/run/gluster

2016-10-27 Thread Atin Mukherjee
Saravana,

Thank you for working on this. We'll be considering this patch for 3.10.

On Thu, Oct 27, 2016 at 11:54 AM, Saravanakumar Arumugam <
sarum...@redhat.com> wrote:

> Hi,
>
> I have refreshed this patch addressing review comments (originally
> authored by Gaurav) which moves brick pid files from /var/lib/glusterd/* to
> /var/run/gluster.
>
> It will be great if you can review this:
> http://review.gluster.org/#/c/13580/
>
> Thank you
>
> Regards,
> Saravana
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 

~ Atin (atinm)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Review request - change pid file location to /var/run/gluster

2016-10-27 Thread Saravanakumar Arumugam

Hi,

I have refreshed this patch addressing review comments (originally 
authored by Gaurav) which moves brick pid files from /var/lib/glusterd/* 
to /var/run/gluster.


It will be great if you can review this:
http://review.gluster.org/#/c/13580/

Thank you

Regards,
Saravana

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel