Re: [Gluster-users] [Stale file handle] in shard volume

2019-01-03 Thread Nithya Balachandran
Adding Krutika.

On Wed, 2 Jan 2019 at 20:56, Olaf Buitelaar 
wrote:

> Hi Nithya,
>
> Thank you for your reply.
>
> the VM's using the gluster volumes keeps on getting paused/stopped on
> errors like these;
> [2019-01-02 02:33:44.469132] E [MSGID: 133010]
> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
> shard 101487 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
> [Stale file handle]
> [2019-01-02 02:33:44.563288] E [MSGID: 133010]
> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
> shard 101488 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
> [Stale file handle]
>
> Krutika, Can you take a look at this?


>
> What i'm trying to find out, if i can purge all gluster volumes from all
> possible stale file handles (and hopefully find a method to prevent this in
> the future), so the VM's can start running stable again.
> For this i need to know when the "shard_common_lookup_shards_cbk" function
> considers a file as stale.
> The statement; "Stale file handle errors show up when a file with a
> specified gfid is not found." doesn't seem to cover it all, as i've shown
> in earlier mails the shard file and glusterfs/xx/xx/uuid file do both
> exist, and have the same inode.
> If the criteria i'm using aren't correct, could you please tell me which
> criteria i should use to determine if a file is stale or not?
> these criteria are just based observations i made, moving the stale files
> manually. After removing them i was able to start the VM again..until some
> time later it hangs on another stale shard file unfortunate.
>
> Thanks Olaf
>
> Op wo 2 jan. 2019 om 14:20 schreef Nithya Balachandran <
> nbala...@redhat.com>:
>
>>
>>
>> On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar 
>> wrote:
>>
>>> Dear All,
>>>
>>> till now a selected group of VM's still seem to produce new stale file's
>>> and getting paused due to this.
>>> I've not updated gluster recently, however i did change the op version
>>> from 31200 to 31202 about a week before this issue arose.
>>> Looking at the .shard directory, i've 100.000+ files sharing the same
>>> characteristics as a stale file. which are found till now,
>>> they all have the sticky bit set, e.g. file permissions; -T. are
>>> 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.
>>>
>>
>> These are internal files used by gluster and do not necessarily mean they
>> are stale. They "point" to data files which may be on different bricks
>> (same name, gfid etc but no linkto xattr and no T permissions).
>>
>>
>>> These files range from long a go (beginning of the year) till now. Which
>>> makes me suspect this was laying dormant for some time now..and somehow
>>> recently surfaced.
>>> Checking other sub-volumes they contain also 0kb files in the .shard
>>> directory, but don't have the sticky bit and the linkto attribute.
>>>
>>> Does anybody else experience this issue? Could this be a bug or an
>>> environmental issue?
>>>
>> These are most likely valid files- please do not delete them without
>> double-checking.
>>
>> Stale file handle errors show up when a file with a specified gfid is not
>> found. You will need to debug the files for which you see this error by
>> checking the bricks to see if they actually exist.
>>
>>>
>>> Also i wonder if there is any tool or gluster command to clean all stale
>>> file handles?
>>> Otherwise i'm planning to make a simple bash script, which iterates over
>>> the .shard dir, checks each file for the above mentioned criteria, and
>>> (re)moves the file and the corresponding .glusterfs file.
>>> If there are other criteria needed to identify a stale file handle, i
>>> would like to hear that.
>>> If this is a viable and safe operation to do of course.
>>>
>>> Thanks Olaf
>>>
>>>
>>>
>>> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <
>>> olaf.buitel...@gmail.com>:
>>>
 Dear All,

 I figured it out, it appeared to be the exact same issue as described
 here;
 https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
 Another subvolume also had the shard file, only were all 0 bytes and
 had the dht.linkto

 for reference;
 [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
 .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
 # file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

 security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
 trusted.gfid=0x298147e49f9748b2baf1c8fff897244d

 trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d61362d3962656634633461386534302e3531353030

 trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100

 [root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex
 .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
 # f

Re: [Gluster-users] Glusterfs 4.1.6

2019-01-03 Thread Amudhan P
Thank you, it works as expected.

On Thu, Jan 3, 2019 at 5:08 PM Ashish Pandey  wrote:

> Hi,
>
> Some of the the steps provided by you are not correct.
> You should have used reset-brick command which was introduced for the same
> task you wanted to do.
>
> 
> https://docs.gluster.org/en/v3/release-notes/3.9.0/
>
> Although your thinking was correct but replacing a faulty disk requires
> some of the additional task which this command
> will do automatically.
>
> Step 1 :- kill pid of the faulty brick in node  >> This should be done
> using "reset-brick start" command. follow the steps provided in link.
> Step 2 :- running volume status, shows "N/A" under 'pid' & 'TCP port'
> Step 3 :- replace disk and mount new disk in same mount point where the
> old disk was mounted
> Step 4 :- run command "gluster v start volname force"  This
> should be done using "reset-brick commit force" command. This will trigger
> the heal.  Follow the link.
> Step 5 :- running volume status,  shows "N/A" under 'pid' & 'TCP port'
>
> ---
> Ashish
>
> --
> *From: *"Amudhan P" 
> *To: *"Gluster Users" 
> *Sent: *Thursday, January 3, 2019 4:25:58 PM
> *Subject: *[Gluster-users] Glusterfs 4.1.6
>
> Hi,
>
> I am working on Glusterfs 4.1.6 on a test machine. I am trying to replace
> a faulty disk and below are the steps I did but wasn't successful with that.
>
> 3 Nodes, 2 disks per node, Disperse Volume 4+2 :-
> Step 1 :- kill pid of the faulty brick in node
> Step 2 :- running volume status, shows "N/A" under 'pid' & 'TCP port'
> Step 3 :- replace disk and mount new disk in same mount point where the
> old disk was mounted
> Step 4 :- run command "gluster v start volname force"
> Step 5 :- running volume status,  shows "N/A" under 'pid' & 'TCP port'
>
> expected behavior was a new brick process & heal should have started.
>
> following above said steps 3.10.1 works perfectly, starting a new brick
> process and heal begins.
> But the same step not working in 4.1.6, Did I miss any steps? what should
> be done?
>
> Amudhan
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterfs 4.1.6

2019-01-03 Thread Ashish Pandey
Hi, 

Some of the the steps provided by you are not correct. 
You should have used reset-brick command which was introduced for the same task 
you wanted to do. 

https://docs.gluster.org/en/v3/release-notes/3.9.0/ 

Although your thinking was correct but replacing a faulty disk requires some of 
the additional task which this command 
will do automatically. 

Step 1 :- kill pid of the faulty brick in node >> This should be done using 
"reset-brick start" command. follow the steps provided in link. 
Step 2 :- running volume status, shows "N/A" under 'pid' & 'TCP port' 
Step 3 :- replace disk and mount new disk in same mount point where the old 
disk was mounted 
Step 4 :- run command "gluster v start volname force"  This should 
be done using "reset-brick commit force" command. This will trigger the heal. 
Follow the link. 
Step 5 :- running volume status, shows "N/A" under ' pid ' & 'TCP port' 

--- 
Ashish 

- Original Message -

From: "Amudhan P"  
To: "Gluster Users"  
Sent: Thursday, January 3, 2019 4:25:58 PM 
Subject: [Gluster-users] Glusterfs 4.1.6 

Hi, 

I am working on Glusterfs 4.1.6 on a test machine. I am trying to replace a 
faulty disk and below are the steps I did but wasn't successful with that. 

3 Nodes, 2 disks per node, Disperse Volume 4+2 :- 
Step 1 :- kill pid of the faulty brick in node 
Step 2 :- running volume status, shows "N/A" under 'pid' & 'TCP port' 
Step 3 :- replace disk and mount new disk in same mount point where the old 
disk was mounted 
Step 4 :- run command "gluster v start volname force" 
Step 5 :- running volume status, shows "N/A" under ' pid ' & 'TCP port' 

expected behavior was a new brick process & heal should have started. 

following above said steps 3.10.1 works perfectly, starting a new brick process 
and heal begins. 
But the same step not working in 4.1.6, Did I miss any steps? what should be 
done? 

Amudhan 

___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Glusterfs 4.1.6

2019-01-03 Thread Amudhan P
Hi,

I am working on Glusterfs 4.1.6 on a test machine. I am trying to replace a
faulty disk and below are the steps I did but wasn't successful with that.

3 Nodes, 2 disks per node, Disperse Volume 4+2 :-
Step 1 :- kill pid of the faulty brick in node
Step 2 :- running volume status, shows "N/A" under 'pid' & 'TCP port'
Step 3 :- replace disk and mount new disk in same mount point where the old
disk was mounted
Step 4 :- run command "gluster v start volname force"
Step 5 :- running volume status,  shows "N/A" under 'pid' & 'TCP port'

expected behavior was a new brick process & heal should have started.

following above said steps 3.10.1 works perfectly, starting a new brick
process and heal begins.
But the same step not working in 4.1.6, Did I miss any steps? what should
be done?

Amudhan
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Error in Installing Glusterfs-4.1.6 from tar

2019-01-03 Thread Ravishankar N
 I don't get these warnings when compiling 4.1.6 on fedora 28 with gcc 
(GCC) 8.1.1.  Perhaps it is a gcc issue 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80593.




On 01/03/2019 01:20 PM, Amudhan P wrote:
Can I skip this warning message in tail mail and continue with the 
installation?


On Thu, Dec 27, 2018 at 5:11 PM Amudhan P > wrote:


Thanks, Ravishankar it worked.
also, I am getting the following warning message when running
`make` is it safe to skip?

dht-layout.c: In function ‘dht_layout_new’:
dht-layout.c:51:9: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
         GF_ATOMIC_INIT (layout->ref, 1);
         ^
dht-layout.c:51:9: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
  CC       dht-helper.lo


  CC       ec.lo
ec.c: In function ‘ec_statistics_init’:
ec.c:637:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.hits, 0);
         ^
ec.c:637:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:638:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.misses, 0);
         ^
ec.c:638:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:639:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.updates, 0);
         ^
ec.c:639:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:640:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.invals, 0);
         ^
ec.c:640:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:641:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.evicts, 0);
         ^
ec.c:641:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:642:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.allocs, 0);
         ^
ec.c:642:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
ec.c:643:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
 GF_ATOMIC_INIT(ec->stats.stripe_cache.errors, 0);
         ^
ec.c:643:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules [-Wstrict-aliasing]
  CC       ec-data.lo


  CCLD posix.la 
.libs/posix-inode-fd-ops.o: In function `posix_do_chmod':

/home/qubevaultadmin/gluster-tar/glusterfs-4.1.6/xlators/storage/posix/src/posix-inode-fd-ops.c:203:
warning: lchmod is not implemented and will always fail
make[5]: Nothing to be done for 'all-am'.


 CC       client-handshake.lo
client-handshake.c: In function ‘clnt_fd_lk_local_create’:
client-handshake.c:150:9: warning: dereferencing type-punned
pointer will break strict-aliasing rules [-Wstrict-aliasing]
         GF_ATOMIC_INIT (local->ref, 1);
         ^
client-handshake.c:150:9: warning: dereferencing type-punned
pointer will break strict-aliasing rules [-Wstrict-aliasing]
  CC       client-callback.lo

  CC       readdir-ahead.lo
readdir-ahead.c: In function ‘init’:
readdir-ahead.c:637:9: warning: dereferencing type-punned pointer
will break strict-aliasing rules [-Wstrict-aliasing]
         GF_ATOMIC_INIT (priv->rda_cache_size, 0);
         ^
readdir-ahead.c:637:9: warning: dereferencing type-punned pointer
will break strict-aliasing rules [-Wstrict-aliasing]
  CCLD readdir-ahead.la 

Making all in src
  CC       md-cache.lo
md-cache.c: In function ‘mdc_init’:
md-cache.c:3431:9: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
         GF_ATOMIC_INIT (conf->mdc_counter.stat_hit, 0);
         ^
md-cache.c:3431:9: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
md-cache.c:3432:9: warning: dereferencing type-punned pointer will
break strict-aliasing rules [-Wstrict-aliasing]
         GF_ATOMIC_INIT (conf->mdc_counter.stat_miss, 0);
         ^
md-cache.c:3432:9: warning: dereferencing type-punned pointer will