Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-11-06 Thread Jim Kinney
That took a while!
I have the following stats:
4085169 files in both bricks3162940 files only have a single hard link.
All of the files exist on both servers. bmidata2 (below) WAS running
when bmidata1 died.
gluster volume heal clifford  statistics heal-countGathering count of
entries to be healed on volume clifford has been successful 
Brick bmidata1:/data/glusterfs/clifford/brick/brickNumber of entries: 0
Brick bmidata2:/data/glusterfs/clifford/brick/brickNumber of entries:
296252
Brick bmidata1:/data/glusterfs/clifford3/brick/brickNumber of entries:
1
Brick bmidata2:/data/glusterfs/clifford3/brick/brickNumber of entries:
182407
Why those numbers are so much smaller than the data from a stat run
through the entire brick, I have no idea.
The 20TB of space for this mount point is composed of 4 10TB brick in a
2x2.
As this is all from a large copy in from a backup source, I'm thinking
of rerunning rsync to overwrite files with same create/modify times on
the mount to realign things (maybe?)
I ran a giant ls/stat on the mount but nothing changed. Ran it again
with no changes.
gluster-health-report 
Loaded reports: glusterd-op-version, georep, gfid-mismatch-dht-report,
glusterd-peer-disconnect, disk_usage, errors_in_logs, coredump,
glusterd, glusterd_volume_version_cksum_errors, kernel_issues,
errors_in_logs, ifconfig, nic-health, process_status
[ OK] Disk used percentage  path=/  percentage=7[ OK] Disk used
percentage  path=/var  percentage=7[ OK] Disk used
percentage  path=/tmp  percentage=7[  ERROR] Report
failure  report=report_check_errors_in_glusterd_log[ OK] All peers
are in connected state  connected_count=1  total_peer_count=1[ OK]
no gfid mismatch[ NOT OK] Failed to check op-version[ NOT OK] The
maximum size of core files created is NOT set to unlimited.[  ERROR]
Report failure  report=report_check_worker_restarts[  ERROR] Report
failure  report=report_non_participating_bricks[ OK] Glusterd is
running  uptime_sec=5177509[  ERROR] Report
failure  report=report_check_version_or_cksum_errors_in_glusterd_log[  
ERROR] Report failure  report=report_check_errors_in_glusterd_log[ NOT
OK] Recieve errors in "ifconfig enp131s0" output[ NOT OK] Recieve
errors in "ifconfig eth0" output[ NOT OK] Recieve errors in "ifconfig
eth3" output[ NOT OK] Recieve errors in "ifconfig mlx_ib0" output[ NOT
OK] Transmission errors in "ifconfig mlx_ib0" output[ NOT OK] Errors
seen in "cat /proc/net/dev -- eth0" output[ NOT OK] Errors seen in "cat
/proc/net/dev -- eth3" output[ NOT OK] Errors seen in "cat
/proc/net/dev -- mlx_ib0" output[ NOT OK] Errors seen in "cat
/proc/net/dev -- enp131s0" outputHigh CPU usage by Self-heal
NOTE: Bmidata2 up for over 300 days. due for reboot. 
On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote:
> Hi Jim,
> 
> Can you check whether the same hardlinks are present on both the
> bricks & both of them have the link count 2?
> If the link count is 2 then  "find  -samefile
>   bits of gfid>//"
> should give you the file path.
> 
> Regards,
> Karthik
> 
> On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney 
> wrote:
> > 
> > 
> > 
> > I'm not so lucky. ALL of mine show 2 links and none have the attr
> > data that supplies the path to the original.
> > 
> > I have the inode from stat. Looking now to dig out the
> > path/filename from  xfs_db on the specific inodes individually.
> > 
> > Is the hash of the filename or /filename and if so relative
> > to where? /, , ?
> > 
> > On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote:
> > > In my case I was able to delete the hard links in the .glusterfs
> > > folders of the bricks and it seems to have done the trick,
> > > thanks!
> > >  
> > > 
> > > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> > > 
> > > 
> > > Sent: Monday, October 23, 2017 1:52 AM
> > > 
> > > To: Jim Kinney ; Matt Waymack  > > dv.com>
> > > 
> > > Cc: gluster-users 
> > > 
> > > Subject: Re: [Gluster-users] gfid entries in volume heal info
> > > that do not heal
> > >  
> > > 
> > > 
> > > 
> > > Hi Jim & Matt,
> > > 
> > > Can you also check for the link count in the stat output of those
> > > hardlink entries in the .glusterfs folder on the bricks.
> > > 
> > > If the link count is 1 on all the bricks for those entries, then
> > > they are orphaned entries and you can delete those hardlinks.
> > > 
> > > 
> > > To be on the safer side have a backup before deleting any of the
> > > entries.
> > > 
> > > 
> > > Regards,
> > > 
> > > 
> > > Karthik
> > > 
> > > 
> > > 
> > >  
> > > 
> > > On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney  > > > wrote:
> > > > 
> > > > I've been following this particular thread as I have a similar
> > > > issue (RAID6 array failed out with 3 dead drives at once while
> > > > a 12 TB load was being copied into one mounted space - what a
> > > > mess)
> > > > 
> > > > 
> > > >  
> > > > 
> > > > 
> > > > I have 

Re: [Gluster-users] [Gluster-devel] Request for Comments: Upgrades from 3.x to 4.0+

2017-11-06 Thread Alastair Neil
Ahh OK I see, thanks


On 6 November 2017 at 00:54, Kaushal M  wrote:

> On Fri, Nov 3, 2017 at 8:50 PM, Alastair Neil 
> wrote:
> > Just so I am clear the upgrade process will be as follows:
> >
> > upgrade all clients to 4.0
> >
> > rolling upgrade all servers to 4.0 (with GD1)
> >
> > kill all GD1 daemons on all servers and run upgrade script (new clients
> > unable to connect at this point)
> >
> > start GD2 ( necessary or does the upgrade script do this?)
> >
> >
> > I assume that once the cluster had been migrated to GD2 the glusterd
> startup
> > script will be smart enough to start the correct version?
> >
>
> This should be the process, mostly.
>
> The upgrade script needs to GD2 running on all nodes before it can
> begin migration.
> But they don't need to have a cluster formed, the script should take
> care of forming the cluster.
>
>
> > -Thanks
> >
> >
> >
> >
> >
> > On 3 November 2017 at 04:06, Kaushal M  wrote:
> >>
> >> On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budic 
> >> wrote:
> >> > Will the various client packages (centos in my case) be able to
> >> > automatically handle the upgrade vs new install decision, or will we
> be
> >> > required to do something manually to determine that?
> >>
> >> We should be able to do this with CentOS (and other RPM based distros)
> >> which have well split glusterfs packages currently.
> >> At this moment, I don't know exactly how much can be handled
> >> automatically, but I expect the amount of manual intervention to be
> >> minimal.
> >> The least minimum amount of manual work needed would be enabling and
> >> starting GD2 and starting the migration script.
> >>
> >> >
> >> > It’s a little unclear that things will continue without interruption
> >> > because
> >> > of the way you describe the change from GD1 to GD2, since it sounds
> like
> >> > it
> >> > stops GD1.
> >>
> >> With the described upgrade strategy, we can ensure continuous volume
> >> access to clients during the whole process (provided volumes have been
> >> setup with replication or ec).
> >>
> >> During the migration from GD1 to GD2, any existing clients still
> >> retain access, and can continue to work without interruption.
> >> This is possible because gluster keeps the management  (glusterds) and
> >> data (bricks and clients) parts separate.
> >> So it is possible to interrupt the management parts, without
> >> interrupting data access to existing clients.
> >> Clients and the server side brick processes need GlusterD to start up.
> >> But once they're running, they can run without GlusterD. GlusterD is
> >> only required again if something goes wrong.
> >> Stopping GD1 during the migration process, will not lead to any
> >> interruptions for existing clients.
> >> The brick process continue to run, and any connected clients continue
> >> to remain connected to the bricks.
> >> Any new clients which try to mount the volumes during this migration
> >> will fail, as a GlusterD will not be available (either GD1 or GD2).
> >>
> >> > Early days, obviously, but if you could clarify if that’s what
> >> > we’re used to as a rolling upgrade or how it works, that would be
> >> > appreciated.
> >>
> >> A Gluster rolling upgrade process, allows data access to volumes
> >> during the process, while upgrading the brick processes as well.
> >> Rolling upgrades with uninterrupted access requires that volumes have
> >> redundancy (replicate or ec).
> >> Rolling upgrades involves upgrading servers belonging to a redundancy
> >> set (replica set or ec set), one at a time.
> >> One at a time,
> >> - A server is picked from a redundancy set
> >> - All Gluster processes are killed on the server, glusterd, bricks and
> >> other daemons included.
> >> - Gluster is upgraded and restarted on the server
> >> - A heal is performed to heal new data onto the bricks.
> >> - Move onto next server after heal finishes.
> >>
> >> Clients maintain uninterrupted access, because a full redundancy set
> >> is never taken offline all at once.
> >>
> >> > Also clarification that we’ll be able to upgrade from 3.x
> >> > (3.1x?) to 4.0, manually or automatically?
> >>
> >> Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe,
> >> gdeploy has playbooks to automate it.
> >> At the end of this you will be left with a 4.0 cluster, but still be
> >> running GD1.
> >> Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script
> >> that automates this is planned only for 4.1.
> >>
> >> >
> >> >
> >> > 
> >> > From: Kaushal M 
> >> > Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to
> 4.0+
> >> > Date: November 2, 2017 at 3:56:05 AM CDT
> >> > To: gluster-users@gluster.org; Gluster Devel
> >> >
> >> > We're fast approaching the time for Gluster-4.0. And we would like to
> >> > set out the expected upgrade strategy and try to polish it to be as

[Gluster-users] Gluster Summit BOF - Rebalance

2017-11-06 Thread Nithya Balachandran
Hi,

We had a BOF on Rebalance  at the Gluster Summit to get feedback from
Gluster users.

- Performance has improved over the last few releases and it works well for
large files.
- However, it is still not fast enough on volumes which contain a lot of
directories and small files. The bottleneck appears to be the
single-threaded filesystem crawl.
- Scripts using the fix-layout and file migration virtual xattrs to
rebalance volumes which are available via the mount point would be helpful
- Rebalance is currently broken on volumes with ZFS bricks (and other FS
where fallocate is not available). A fix for this is being worked on [1]
and should be ready soon.
- The rebalance status output is satisfactory

Amar, Susant, Raghavendra, please add if I have missed something.

Regards,
Nithya

[1] https://review.gluster.org/18573
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how to verify bitrot signed file manually?

2017-11-06 Thread Amudhan P
Any update?

On Fri, Oct 13, 2017 at 1:14 PM, Amudhan P  wrote:

> any update?.
>
> why is it marked bad?
>
> Any way to find out what happened to the file?
>
>
> On Tue, Oct 3, 2017 at 12:44 PM, Amudhan P  wrote:
>
>>
>> my volume is distributed disperse volume 8+2 EC.
>> file1 and file2 are different files lying in same brick. I am able to
>> read the file from mount point without any issue because of EC it reads
>> rest of the available blocks in other nodes.
>>
>> my question is "file1" sha256 value matches bitrot signature value but
>> still, it is also marked as bad by scrubber daemon. why is that?
>>
>>
>>
>> On Fri, Sep 29, 2017 at 12:52 PM, Kotresh Hiremath Ravishankar <
>> khire...@redhat.com> wrote:
>>
>>> Hi Amudhan,
>>>
>>> Sorry for the late response as I was busy with other things. You are
>>> right bitrot uses sha256 for checksum.
>>> If file-1, file-2 are marked bad, the I/O should be errored out with
>>> EIO. If that is not happening, we need
>>> to look further into it. But what's the file contents of file-1 and
>>> file-2 on the replica bricks ? Are they
>>> matching ?
>>>
>>> Thanks and Regards,
>>> Kotresh HR
>>>
>>> On Mon, Sep 25, 2017 at 4:19 PM, Amudhan P  wrote:
>>>
 resending mail.


 On Fri, Sep 22, 2017 at 5:30 PM, Amudhan P  wrote:

> ok, from bitrot code I figured out gluster using sha256 hashing algo.
>
>
> Now coming to the problem, during scrub run in my cluster some of my
> files were marked as bad in few set of nodes.
> I just wanted to confirm bad file. so, I have used "sha256sum" tool in
> Linux to manually get file hash.
>
> here is the result.
>
> file-1, file-2 marked as bad by scrub and file-3 is healthy.
>
> file-1 sha256 and bitrot signature value matches but still it's been
> marked as bad.
>
> file-2 sha256 and bitrot signature value don't match, could be a
> victim of bitrot or bitflip.file is still readable without any issue and 
> no
> errors found in the drive.
>
> file-3 sha256 and bitrot signature matches and healthy.
>
>
> file-1 output from
>
> "sha256sum" = "71eada9352b1352aaef0f806d3d56
> 1768ce2df905ded1668f665e06eca2d0bd4"
>
>
> "getfattr -m. -e hex -d "
> # file: file-1
> trusted.bit-rot.bad-file=0x3100
> trusted.bit-rot.signature=0x01020071eada9352b135
> 2aaef0f806d3d561768ce2df905ded1668f665e06eca2d0bd4
> trusted.bit-rot.version=0x020058e4f3b40006793d
> trusted.ec.config=0x080a02000200
> trusted.ec.dirty=0x
> trusted.ec.size=0x000718996701
> trusted.ec.version=0x00038c4c00038c4d
> trusted.gfid=0xf078a24134fe4f9bb953eca8c28dea9a
>
> output scrub log:
> [2017-09-02 13:02:20.311160] A [MSGID: 118023]
> [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
> CORRUPTION DETECTED: Object /file-1 {Brick: /media/disk16/brick16 | GFID:
> f078a241-34fe-4f9b-b953-eca8c28dea9a}
> [2017-09-02 13:02:20.311579] A [MSGID: 118024]
> [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
> Marking /file-1 [GFID: f078a241-34fe-4f9b-b953-eca8c28dea9a | Brick:
> /media/disk16/brick16] as corrupted..
>
> file-2 output from
>
> "sha256sum" = "c41ef9c81faed4f3e6010ea67984c
> 3cfefd842f98ee342939151f9250972dcda"
>
>
> "getfattr -m. -e hex -d "
> # file: file-2
> trusted.bit-rot.bad-file=0x3100
> trusted.bit-rot.signature=0x0102009162cb17d4f0be
> e676fcb7830c5286d05b8e8940d14f3d117cb90b7b1defc129
> trusted.bit-rot.version=0x020058e4f3b400019bb2
> trusted.ec.config=0x080a02000200
> trusted.ec.dirty=0x
> trusted.ec.size=0x403433f6
> trusted.ec.version=0x201a201b
> trusted.gfid=0xa50012b0a632477c99232313928d239a
>
> output scrub log:
> [2017-09-02 05:18:14.003156] A [MSGID: 118023]
> [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
> CORRUPTION DETECTED: Object /file-2 {Brick: /media/disk13/brick13 | GFID:
> a50012b0-a632-477c-9923-2313928d239a}
> [2017-09-02 05:18:14.006629] A [MSGID: 118024]
> [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0:
> Marking /file-2 [GFID: a50012b0-a632-477c-9923-2313928d239a | Brick:
> /media/disk13/brick13] as corrupted..
>
>
> file-3 output from
>
> "sha256sum" = "a590735b3c8936cc7ca9835128a19
> c38a3f79c8fd53fddc031a9349b7e273f27"
>
>
> "getfattr -m. -e hex -d "
> # file: file-3
> trusted.bit-rot.signature=0x010200a590735b3c8936
> cc7ca9835128a19c38a3f79c8fd53fddc031a9349b7e273f27
> trusted.bit-rot.version=0x020058e4f3b400019bb2
> 

[Gluster-users] ?????? glusterfs segmentation fault in rdma mode

2017-11-06 Thread acfreeman
Hi ,all

 We found a strange problem. Some clients worked normally while some clients 
couldn't access sepcial files. For exmaple, Client A couldn't create the 
directory xxx, but Client B could. However, if Client B created the directory, 
Client A could acess it and even deleted it. But Client A still couldn't create 
the same directory later. If I changed the directory name, Client A worked 
without problems. It seemed that there were some problems with special bricks 
in special clients. But all the bricks were online.

I saw this in the logs in the GlusterFS client after creating directory failure:
[2017-11-06 11:55:18.420610] W [MSGID: 109011] 
[dht-layout.c:186:dht_layout_search] 0-data-dht: no subvolume for hash (value) 
= 4148753024
[2017-11-06 11:55:18.457744] W [fuse-bridge.c:521:fuse_entry_cbk] 
0-glusterfs-fuse: 488: MKDIR() /xxx => -1 (Input/output error)
The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-data-dht: 
no subvolume for hash (value) = 4148753024" repeated 3 times between 
[2017-11-06 11:55:18.420610] and [2017-11-06 11:55:18.457731]




--  --
??: "Bennbsp;Turner";;
: 2017??11??5??(??) 3:00
??: "acfreeman"<21291...@qq.com>;
: "gluster-users"; 
: Re: [Gluster-users] glusterfs segmentation fault in rdma mode



This looks like there could be some some problem requesting / leaking / 
whatever memory but without looking at the core its tought to tell for sure.   
Note:

/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618]

Can you open up a bugzilla and get us the core file to review?

-b

- Original Message -
> From: "??" <21291...@qq.com>
> To: "gluster-users" 
> Sent: Saturday, November 4, 2017 5:27:50 AM
> Subject: [Gluster-users] glusterfs segmentation fault in rdma mode
> 
> 
> 
> Hi, All,
> 
> 
> 
> 
> I used Infiniband to connect all GlusterFS nodes and the clients. Previously
> I run IP over IB and everything was OK. Now I used rdma transport mode
> instead. And then I ran the traffic. After I while, the glusterfs process
> exited because of segmentation fault.
> 
> 
> 
> 
> Here were the messages when I saw segmentation fault:
> 
> pending frames:
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(1) op(WRITE)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> frame : type(0) op(0)
> 
> patchset: git:// git.gluster.org/glusterfs.git
> 
> signal received: 11
> 
> time of crash:
> 
> 2017-11-01 11:11:23
> 
> configuration details:
> 
> argp 1
> 
> backtrace 1
> 
> dlfcn 1
> 
> libpthread 1
> 
> llistxattr 1
> 
> setfsid 1
> 
> spinlock 1
> 
> epoll.h 1
> 
> xattr.h 1
> 
> st_atim.tv_nsec 1
> 
> package-string: glusterfs 3.11.0
> 
> /usr/lib64/ libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618 ]
> 
> /usr/lib64/ libglusterfs.so.0(gf_print_trace+0x324)[0x7f95bc557834 ]
> 
> /lib64/ libc.so.6(+0x32510)[0x7f95bace2510 ]
> 
> The client OS was CentOS 7.3. The server OS was CentOS 6.5. The GlusterFS
> version was 3.11.0 both in clients and servers. The Infiniband card was
> Mellanox. The Mellanox IB driver version was v4.1-1.0.2 (27 Jun 2017) both
> in clients and servers.
> 
> 
> Is rdma code stable for GlusterFS? Need I upgrade the IB driver or apply a
> patch?
> 
> Thanks!
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users