Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Davide Obbi
then my last idea would be trying to create the same files or run the
application on the other volumes, sorry but i will be interested in the
solution!

On Mon, Jan 7, 2019 at 7:52 PM Matt Waymack  wrote:

> Yep, first unmount/remounted, then rebooted clients.  Stopped/started the
> volumes, and rebooted all nodes.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Monday, January 7, 2019 12:47 PM
> *To:* Matt Waymack 
> *Cc:* Raghavendra Gowdappa ;
> gluster-users@gluster.org List 
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> i guess you tried already unmounting, stop/star and mounting?
>
>
>
> On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack  wrote:
>
> Yes, all volumes use sharding.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Monday, January 7, 2019 12:43 PM
> *To:* Matt Waymack 
> *Cc:* Raghavendra Gowdappa ;
> gluster-users@gluster.org List 
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> are all the volumes being configured with sharding?
>
>
>
> On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack  wrote:
>
> I think that I can rule out network as I have multiple volumes on the same
> nodes and not all volumes are affected.  Additionally, access via SMB using
> samba-vfs-glusterfs is not affected, even on the same volumes.   This is
> seemingly only affecting the FUSE clients.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Sunday, January 6, 2019 12:26 PM
> *To:* Raghavendra Gowdappa 
> *Cc:* Matt Waymack ; gluster-users@gluster.org List <
> gluster-users@gluster.org>
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> Hi,
>
>
>
> i would start doing some checks like: "(Input/output error)" seems
> returned by the operating system, this happens for instance trying to
> access a file system which is on a device not available so i would check
> the network connectivity between the client to servers  and server to
> server during the reported time.
>
>
>
> Regards
>
> Davide
>
>
>
> On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack  wrote:
>
> Hi all,
>
>
>
> I'm having a problem writing to our volume.  When writing files larger
> than about 2GB, I get an intermittent issue where the write will fail and
> return Input/Output error.  This is also shown in the FUSE log of the
> client (this is affecting all clients).  A snip of a client log is below:
>
> [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51040978: WRITE => -1
> gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041266: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output
> error)
>
> [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041548: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times
> between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]
>
> The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk]
> 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05
> 22:39:33.925981] and [2019-01-05 22:39:50.451862]
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times
> between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]
>
>
>
> This looks to be a DHT issue. Some questions:
>
> * Are all subvolumes of DHT up and client is connected to them?
> Particularly the subvolume which contains the file in question.
>
> * Can you get all extended attributes of parent directory of the file from
> all bricks?
>
> * set diagnostics.client-log-level to 

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
Yep, first unmount/remounted, then rebooted clients.  Stopped/started the 
volumes, and rebooted all nodes.

From: Davide Obbi 
Sent: Monday, January 7, 2019 12:47 PM
To: Matt Waymack 
Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

i guess you tried already unmounting, stop/star and mounting?

On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
Yes, all volumes use sharding.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Monday, January 7, 2019 12:43 PM
To: Matt Waymack mailto:mwaym...@nsgdv.com>>
Cc: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

are all the volumes being configured with sharding?

On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Davide Obbi
are all the volumes being configured with sharding?

On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack  wrote:

> I think that I can rule out network as I have multiple volumes on the same
> nodes and not all volumes are affected.  Additionally, access via SMB using
> samba-vfs-glusterfs is not affected, even on the same volumes.   This is
> seemingly only affecting the FUSE clients.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Sunday, January 6, 2019 12:26 PM
> *To:* Raghavendra Gowdappa 
> *Cc:* Matt Waymack ; gluster-users@gluster.org List <
> gluster-users@gluster.org>
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> Hi,
>
>
>
> i would start doing some checks like: "(Input/output error)" seems
> returned by the operating system, this happens for instance trying to
> access a file system which is on a device not available so i would check
> the network connectivity between the client to servers  and server to
> server during the reported time.
>
>
>
> Regards
>
> Davide
>
>
>
> On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack  wrote:
>
> Hi all,
>
>
>
> I'm having a problem writing to our volume.  When writing files larger
> than about 2GB, I get an intermittent issue where the write will fail and
> return Input/Output error.  This is also shown in the FUSE log of the
> client (this is affecting all clients).  A snip of a client log is below:
>
> [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51040978: WRITE => -1
> gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041266: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output
> error)
>
> [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041548: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times
> between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]
>
> The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk]
> 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05
> 22:39:33.925981] and [2019-01-05 22:39:50.451862]
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times
> between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]
>
>
>
> This looks to be a DHT issue. Some questions:
>
> * Are all subvolumes of DHT up and client is connected to them?
> Particularly the subvolume which contains the file in question.
>
> * Can you get all extended attributes of parent directory of the file from
> all bricks?
>
> * set diagnostics.client-log-level to TRACE, capture these errors again
> and attach the client log file.
>
>
>
> I spoke a bit early. dht_writev doesn't search hashed subvolume as its
> already been looked up in lookup. So, these msgs looks to be of a different
> issue - not  writev failure.
>
>
>
>
>
> This is intermittent for most files, but eventually if a file is large
> enough it will not write.  The workflow is SFTP tot he client which then
> writes to the volume over FUSE.  When files get to a certain point,w e can
> no longer write to them.  The file sizes are different as well, so it's not
> like they all get to the same size and just stop either.  I've ruled out a
> free space issue, our files at their largest are only a few hundred GB and
> we have tens of terrabytes free on each brick.  We are also sharding at 1GB.
>
>
>
> I'm not sure where to go from here as the error seems vague and I can only
> see it on the client log.  I'm not seeing these errors on the nodes
> themselves.  This is also seen if I mount the volume via FUSE on any o

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Davide Obbi
i guess you tried already unmounting, stop/star and mounting?

On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack  wrote:

> Yes, all volumes use sharding.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Monday, January 7, 2019 12:43 PM
> *To:* Matt Waymack 
> *Cc:* Raghavendra Gowdappa ;
> gluster-users@gluster.org List 
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> are all the volumes being configured with sharding?
>
>
>
> On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack  wrote:
>
> I think that I can rule out network as I have multiple volumes on the same
> nodes and not all volumes are affected.  Additionally, access via SMB using
> samba-vfs-glusterfs is not affected, even on the same volumes.   This is
> seemingly only affecting the FUSE clients.
>
>
>
> *From:* Davide Obbi 
> *Sent:* Sunday, January 6, 2019 12:26 PM
> *To:* Raghavendra Gowdappa 
> *Cc:* Matt Waymack ; gluster-users@gluster.org List <
> gluster-users@gluster.org>
> *Subject:* Re: [External] Re: [Gluster-users] Input/output error on FUSE
> log
>
>
>
> Hi,
>
>
>
> i would start doing some checks like: "(Input/output error)" seems
> returned by the operating system, this happens for instance trying to
> access a file system which is on a device not available so i would check
> the network connectivity between the client to servers  and server to
> server during the reported time.
>
>
>
> Regards
>
> Davide
>
>
>
> On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
> wrote:
>
>
>
>
>
> On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack  wrote:
>
> Hi all,
>
>
>
> I'm having a problem writing to our volume.  When writing files larger
> than about 2GB, I get an intermittent issue where the write will fail and
> return Input/Output error.  This is also shown in the FUSE log of the
> client (this is affecting all clients).  A snip of a client log is below:
>
> [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51040978: WRITE => -1
> gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041266: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output
> error)
>
> [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)
>
> [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk]
> 0-glusterfs-fuse: 51041548: WRITE => -1
> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output
> error)
>
> [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times
> between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]
>
> The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk]
> 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05
> 22:39:33.925981] and [2019-01-05 22:39:50.451862]
>
> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
> 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times
> between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]
>
>
>
> This looks to be a DHT issue. Some questions:
>
> * Are all subvolumes of DHT up and client is connected to them?
> Particularly the subvolume which contains the file in question.
>
> * Can you get all extended attributes of parent directory of the file from
> all bricks?
>
> * set diagnostics.client-log-level to TRACE, capture these errors again
> and attach the client log file.
>
>
>
> I spoke a bit early. dht_writev doesn't search hashed subvolume as its
> already been looked up in lookup. So, these msgs looks to be of a different
> issue - not  writev failure.
>
>
>
>
>
> This is intermittent for most files, but eventually if a file is large
> enough it will not write.  The workflow is SFTP tot he client which then
> writes to the volume over FUSE.  When files get to a certain point,w e can
> no longer write to them.  The file sizes are different as well, so it's not
> like t

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
Yes, all volumes use sharding.

From: Davide Obbi 
Sent: Monday, January 7, 2019 12:43 PM
To: Matt Waymack 
Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

are all the volumes being configured with sharding?

On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here 

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi 
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa 
Cc: Matt Waymack ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here is the volume info:
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: tpc-glus4:/exp/b1/gv1
Brick2: tpc-glus2:/exp/b1/gv1
Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
Brick4: tpc-glus2:/exp/b2/gv1
Brick5: tpc-glus4:/exp/b2/gv1
Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
Brick7: tpc-glus4:/exp/b3/gv1
Brick8: tpc-glus2:/exp/b3/gv1
Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
Brick10: tpc-glus4:/exp/b4/gv1
Brick11: tpc-glus2:/ex

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-06 Thread Davide Obbi
Hi,

i would start doing some checks like: "(Input/output error)" seems returned
by the operating system, this happens for instance trying to access a file
system which is on a device not available so i would check the network
connectivity between the client to servers  and server to server during the
reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
wrote:

>
>
> On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack  wrote:
>>
>>> Hi all,
>>>
>>>
>>> I'm having a problem writing to our volume.  When writing files larger
>>> than about 2GB, I get an intermittent issue where the write will fail and
>>> return Input/Output error.  This is also shown in the FUSE log of the
>>> client (this is affecting all clients).  A snip of a client log is below:
>>>
>>> [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51040978: WRITE => -1
>>> gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)
>>>
>>> [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51041266: WRITE => -1
>>> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)
>>>
>>> [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51041548: WRITE => -1
>>> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)
>>>
>>> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
>>> 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times
>>> between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]
>>>
>>> The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk]
>>> 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05
>>> 22:39:33.925981] and [2019-01-05 22:39:50.451862]
>>>
>>> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
>>> 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times
>>> between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]
>>>
>>
>> This looks to be a DHT issue. Some questions:
>> * Are all subvolumes of DHT up and client is connected to them?
>> Particularly the subvolume which contains the file in question.
>> * Can you get all extended attributes of parent directory of the file
>> from all bricks?
>> * set diagnostics.client-log-level to TRACE, capture these errors again
>> and attach the client log file.
>>
>
> I spoke a bit early. dht_writev doesn't search hashed subvolume as its
> already been looked up in lookup. So, these msgs looks to be of a different
> issue - not  writev failure.
>
>
>>
>>> This is intermittent for most files, but eventually if a file is large
>>> enough it will not write.  The workflow is SFTP tot he client which then
>>> writes to the volume over FUSE.  When files get to a certain point,w e can
>>> no longer write to them.  The file sizes are different as well, so it's not
>>> like they all get to the same size and just stop either.  I've ruled out a
>>> free space issue, our files at their largest are only a few hundred GB and
>>> we have tens of terrabytes free on each brick.  We are also sharding at 1GB.
>>>
>>> I'm not sure where to go from here as the error seems vague and I can
>>> only see it on the client log.  I'm not seeing these errors on the nodes
>>> themselves.  This is also seen if I mount the volume via FUSE on any of the
>>> nodes as well and it is only reflected in the FUSE log.
>>>
>>> Here is the volume info:
>>> Volume Name: gv1
>>> Type: Distributed-Replicate
>>> Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 8 x (2 + 1) = 24
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: tpc-glus4:/exp/b1/gv1
>>> Brick2: tpc-glus2:/exp/b1/gv1
>>> Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
>>> Brick4: tpc-glus2:/exp/b2/gv1
>>> Brick5: tpc-glus4:/exp/b2/gv1
>>> Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
>>> Brick7: tpc-glus4:/exp/b3/gv1
>>> Brick8: tpc-glus2:/exp/b3/gv1
>>> Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
>>> Brick10: tpc-glus4:/exp/b4/gv1
>>> Brick11: tpc-glus2:/exp/b4/gv1
>>> Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)
>>> Brick13: tpc-glus1:/exp/b5/gv1
>>> Brick14: tpc-glus3:/exp/b5/gv1
>>> Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter)
>>> Brick16: tpc-glus1:/exp/b6/gv1
>>> Brick17: tpc-glus3:/exp/b6/gv1
>>> Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter)
>>> Brick19: tpc-glus1:/exp/b7/gv1
>>>