Re: [Gluster-users] java application crushes while reading a zip file
Amar, Thank you for helping me troubleshoot the issues. I don't have the resources to test the software at this point, but I will keep it in mind. Regards, Dmitry On Tue, Jan 22, 2019 at 1:02 AM Amar Tumballi Suryanarayan < atumb...@redhat.com> wrote: > Dmitry, > > Thanks for the detailed updates on this thread. Let us know how your > 'production' setup is running. For much smoother next upgrade, we request > you to help out with some early testing of glusterfs-6 RC builds which are > expected to be out by Feb 1st week. > > Also, if it is possible for you to automate the tests, it would be great > to have it in our regression, so we can always be sure your setup would > never break in future releases. > > Regards, > Amar > > On Mon, Jan 7, 2019 at 11:42 PM Dmitry Isakbayev > wrote: > >> This system is going into production. I will try to replicate this >> problem on the next installation. >> >> On Wed, Jan 2, 2019 at 9:25 PM Raghavendra Gowdappa >> wrote: >> >>> >>> >>> On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev >>> wrote: >>> >>>> Still no JVM crushes. Is it possible that running glusterfs with >>>> performance options turned off for a couple of days cleared out the "stale >>>> metadata issue"? >>>> >>> >>> restarting these options, would've cleared the existing cache and hence >>> previous stale metadata would've been cleared. Hitting stale metadata >>> again depends on races. That might be the reason you are still not seeing >>> the issue. Can you try with enabling all perf xlators (default >>> configuration)? >>> >>> >>>> >>>> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev >>>> wrote: >>>> >>>>> The software ran with all of the options turned off over the weekend >>>>> without any problems. >>>>> I will try to collect the debug info for you. I have re-enabled the 3 >>>>> three options, but yet to see the problem reoccurring. >>>>> >>>>> >>>>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa < >>>>> rgowd...@redhat.com> wrote: >>>>> >>>>>> Thanks Dmitry. Can you provide the following debug info I asked >>>>>> earlier: >>>>>> >>>>>> * strace -ff -v ... of java application >>>>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse >>>>>> while mounting). >>>>>> >>>>>> regards, >>>>>> Raghavendra >>>>>> >>>>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev >>>>>> wrote: >>>>>> >>>>>>> These 3 options seem to trigger both (reading zip file and renaming >>>>>>> files) problems. >>>>>>> >>>>>>> Options Reconfigured: >>>>>>> performance.io-cache: off >>>>>>> performance.stat-prefetch: off >>>>>>> performance.quick-read: off >>>>>>> performance.parallel-readdir: off >>>>>>> *performance.readdir-ahead: on* >>>>>>> *performance.write-behind: on* >>>>>>> *performance.read-ahead: on* >>>>>>> performance.client-io-threads: off >>>>>>> nfs.disable: on >>>>>>> transport.address-family: inet >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev >>>>>>> wrote: >>>>>>> >>>>>>>> Turning a single option on at a time still worked fine. I will >>>>>>>> keep trying. >>>>>>>> >>>>>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or >>>>>>>> log messages. Do you suppose these issues are triggered by the new >>>>>>>> environment or did not exist in 4.1.5? >>>>>>>> >>>>>>>> [root@node1 ~]# glusterfs --version >>>>>>>> glusterfs 4.1.5 >>>>>>>> >>>>>>>> On AWS using >>>>>>>> [root@node1 ~]# hostnamectl >>>>>>>>Static hostname: node1 >>>>>>>> Icon name: computer-vm >>>>>>>>Chassis: vm >>&
Re: [Gluster-users] [External] Re: A broken file that can not be deleted
Nithyya, Raghavendra, Davide Thank you for your help. *>how the permissions are displayed for the file on the servers?* Trying to answer this question is what fixed the problem. It looked just fine on all 3 servers. And it looks like running "ls" on the servers fixed it on the clients. I had to repeat it on all 3 servers. It fixed file permissions on 2 clients and made the file show up on the 3rd client. Even though it fixed the file permissions and I could now view contents of the file, the software was still having issues with renaming ".download_suspensions.memo.writing" to ".download_suspensions.memo" When I tried to replace the file manually, I got $ mv .download_suspensions.memo.writing .download_suspensions.memo mv: ‘.download_suspensions.memo.writing’ and ‘.download_suspensions.memo’ are the same file I ended up removing both files and having the software rebuild them. > *Wondering whether its a case of split brain.*Very possible. All 3 servers were rebooted. It brought down linux cluster running on the same 3 servers as well. > On Thu, Jan 10, 2019 at 10:00 AM Raghavendra Gowdappa > wrote: > >> >> >> On Wed, Jan 9, 2019 at 7:48 PM Dmitry Isakbayev >> wrote: >> >>> I am seeing a broken file that exists on 2 out of 3 nodes. >>> >> >> Wondering whether its a case of split brain. >> >> >>> The application trying to use the file throws file permissions error. >>> ls, rm, mv, touch all throw "Input/output error" >>> >>> $ ls -la >>> ls: cannot access .download_suspensions.memo: Input/output error >>> drwxrwxr-x. 2 ossadmin ossadmin 4096 Jan 9 08:06 . >>> drwxrwxr-x. 5 ossadmin ossadmin 4096 Jan 3 11:36 .. >>> -?? ? ???? >>> .download_suspensions.memo >>> >>> $ rm ".download_suspensions.memo" >>> rm: cannot remove ‘.download_suspensions.memo’: Input/output error >>> >>> >>> >>> >>> >>> ___ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Davide Obbi > Senior System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > Direct +31207031558 > [image: Booking.com] <https://www.booking.com/> > Empowering people to experience the world since 1996 > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 > million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] A broken file that can not be deleted
I am seeing a broken file that exists on 2 out of 3 nodes. The application trying to use the file throws file permissions error. ls, rm, mv, touch all throw "Input/output error" $ ls -la ls: cannot access .download_suspensions.memo: Input/output error drwxrwxr-x. 2 ossadmin ossadmin 4096 Jan 9 08:06 . drwxrwxr-x. 5 ossadmin ossadmin 4096 Jan 3 11:36 .. -?? ? ???? .download_suspensions.memo $ rm ".download_suspensions.memo" rm: cannot remove ‘.download_suspensions.memo’: Input/output error ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] java application crushes while reading a zip file
This system is going into production. I will try to replicate this problem on the next installation. On Wed, Jan 2, 2019 at 9:25 PM Raghavendra Gowdappa wrote: > > > On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev wrote: > >> Still no JVM crushes. Is it possible that running glusterfs with >> performance options turned off for a couple of days cleared out the "stale >> metadata issue"? >> > > restarting these options, would've cleared the existing cache and hence > previous stale metadata would've been cleared. Hitting stale metadata > again depends on races. That might be the reason you are still not seeing > the issue. Can you try with enabling all perf xlators (default > configuration)? > > >> >> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev >> wrote: >> >>> The software ran with all of the options turned off over the weekend >>> without any problems. >>> I will try to collect the debug info for you. I have re-enabled the 3 >>> three options, but yet to see the problem reoccurring. >>> >>> >>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> >>>> Thanks Dmitry. Can you provide the following debug info I asked earlier: >>>> >>>> * strace -ff -v ... of java application >>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >>>> mounting). >>>> >>>> regards, >>>> Raghavendra >>>> >>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev >>>> wrote: >>>> >>>>> These 3 options seem to trigger both (reading zip file and renaming >>>>> files) problems. >>>>> >>>>> Options Reconfigured: >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> performance.quick-read: off >>>>> performance.parallel-readdir: off >>>>> *performance.readdir-ahead: on* >>>>> *performance.write-behind: on* >>>>> *performance.read-ahead: on* >>>>> performance.client-io-threads: off >>>>> nfs.disable: on >>>>> transport.address-family: inet >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev >>>>> wrote: >>>>> >>>>>> Turning a single option on at a time still worked fine. I will keep >>>>>> trying. >>>>>> >>>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>>>>> messages. Do you suppose these issues are triggered by the new >>>>>> environment >>>>>> or did not exist in 4.1.5? >>>>>> >>>>>> [root@node1 ~]# glusterfs --version >>>>>> glusterfs 4.1.5 >>>>>> >>>>>> On AWS using >>>>>> [root@node1 ~]# hostnamectl >>>>>>Static hostname: node1 >>>>>> Icon name: computer-vm >>>>>>Chassis: vm >>>>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>>>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>>>>> Virtualization: kvm >>>>>> Operating System: CentOS Linux 7 (Core) >>>>>>CPE OS Name: cpe:/o:centos:centos:7 >>>>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>>>>> Architecture: x86-64 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>>>>> rgowd...@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev >>>>>>> wrote: >>>>>>> >>>>>>>> Ok. I will try different options. >>>>>>>> >>>>>>>> This system is scheduled to go into production soon. What version >>>>>>>> would you recommend to roll back to? >>>>>>>> >>>>>>> >>>>>>> These are long standing issues. So, rolling back may not make these >>>>>>> issues go away. Instead if you think performance is agreeable to you, >>>>>>> please keep these xlators off in production. &g
Re: [Gluster-users] java application crushes while reading a zip file
Still no JVM crushes. Is it possible that running glusterfs with performance options turned off for a couple of days cleared out the "stale metadata issue"? On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev wrote: > The software ran with all of the options turned off over the weekend > without any problems. > I will try to collect the debug info for you. I have re-enabled the 3 > three options, but yet to see the problem reoccurring. > > > On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa > wrote: > >> Thanks Dmitry. Can you provide the following debug info I asked earlier: >> >> * strace -ff -v ... of java application >> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while >> mounting). >> >> regards, >> Raghavendra >> >> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev >> wrote: >> >>> These 3 options seem to trigger both (reading zip file and renaming >>> files) problems. >>> >>> Options Reconfigured: >>> performance.io-cache: off >>> performance.stat-prefetch: off >>> performance.quick-read: off >>> performance.parallel-readdir: off >>> *performance.readdir-ahead: on* >>> *performance.write-behind: on* >>> *performance.read-ahead: on* >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev >>> wrote: >>> >>>> Turning a single option on at a time still worked fine. I will keep >>>> trying. >>>> >>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>>> messages. Do you suppose these issues are triggered by the new environment >>>> or did not exist in 4.1.5? >>>> >>>> [root@node1 ~]# glusterfs --version >>>> glusterfs 4.1.5 >>>> >>>> On AWS using >>>> [root@node1 ~]# hostnamectl >>>>Static hostname: node1 >>>> Icon name: computer-vm >>>>Chassis: vm >>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>>> Virtualization: kvm >>>> Operating System: CentOS Linux 7 (Core) >>>>CPE OS Name: cpe:/o:centos:centos:7 >>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>>> Architecture: x86-64 >>>> >>>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>>> rgowd...@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev >>>>> wrote: >>>>> >>>>>> Ok. I will try different options. >>>>>> >>>>>> This system is scheduled to go into production soon. What version >>>>>> would you recommend to roll back to? >>>>>> >>>>> >>>>> These are long standing issues. So, rolling back may not make these >>>>> issues go away. Instead if you think performance is agreeable to you, >>>>> please keep these xlators off in production. >>>>> >>>>> >>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>>> rgowd...@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev >>>>>>> wrote: >>>>>>> >>>>>>>> Raghavendra, >>>>>>>> >>>>>>>> Thank for the suggestion. >>>>>>>> >>>>>>>> >>>>>>>> I am suing >>>>>>>> >>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>>>>>> glusterfs 5.0 >>>>>>>> >>>>>>>> On >>>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>>>>>> Icon name: computer-vm >>>>>>>>Chassis: vm >>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>>>Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>>> Virtualization: vmware >>>>>>>> Operating Sy
Re: [Gluster-users] java application crushes while reading a zip file
The software ran with all of the options turned off over the weekend without any problems. I will try to collect the debug info for you. I have re-enabled the 3 three options, but yet to see the problem reoccurring. On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa wrote: > Thanks Dmitry. Can you provide the following debug info I asked earlier: > > * strace -ff -v ... of java application > * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while > mounting). > > regards, > Raghavendra > > On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev > wrote: > >> These 3 options seem to trigger both (reading zip file and renaming >> files) problems. >> >> Options Reconfigured: >> performance.io-cache: off >> performance.stat-prefetch: off >> performance.quick-read: off >> performance.parallel-readdir: off >> *performance.readdir-ahead: on* >> *performance.write-behind: on* >> *performance.read-ahead: on* >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> >> >> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev >> wrote: >> >>> Turning a single option on at a time still worked fine. I will keep >>> trying. >>> >>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log >>> messages. Do you suppose these issues are triggered by the new environment >>> or did not exist in 4.1.5? >>> >>> [root@node1 ~]# glusterfs --version >>> glusterfs 4.1.5 >>> >>> On AWS using >>> [root@node1 ~]# hostnamectl >>>Static hostname: node1 >>> Icon name: computer-vm >>>Chassis: vm >>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f >>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9 >>> Virtualization: kvm >>> Operating System: CentOS Linux 7 (Core) >>>CPE OS Name: cpe:/o:centos:centos:7 >>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64 >>> Architecture: x86-64 >>> >>> >>> >>> >>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev >>>> wrote: >>>> >>>>> Ok. I will try different options. >>>>> >>>>> This system is scheduled to go into production soon. What version >>>>> would you recommend to roll back to? >>>>> >>>> >>>> These are long standing issues. So, rolling back may not make these >>>> issues go away. Instead if you think performance is agreeable to you, >>>> please keep these xlators off in production. >>>> >>>> >>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>>>> rgowd...@redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev >>>>>> wrote: >>>>>> >>>>>>> Raghavendra, >>>>>>> >>>>>>> Thank for the suggestion. >>>>>>> >>>>>>> >>>>>>> I am suing >>>>>>> >>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>>>>> glusterfs 5.0 >>>>>>> >>>>>>> On >>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>>>>> Icon name: computer-vm >>>>>>>Chassis: vm >>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>>>Boot ID: eed98992fdda4c88bdd459a89101766b >>>>>>> Virtualization: vmware >>>>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>>>> Architecture: x86-64 >>>>>>> >>>>>>> >>>>>>> I have configured the following options >>>>>>> >>>>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info >>>>>>> Volume Name: gv0 >>>>>>> Type: Replicate >>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b
Re: [Gluster-users] java application crushes while reading a zip file
These 3 options seem to trigger both (reading zip file and renaming files) problems. Options Reconfigured: performance.io-cache: off performance.stat-prefetch: off performance.quick-read: off performance.parallel-readdir: off *performance.readdir-ahead: on* *performance.write-behind: on* *performance.read-ahead: on* performance.client-io-threads: off nfs.disable: on transport.address-family: inet On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev wrote: > Turning a single option on at a time still worked fine. I will keep > trying. > > We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log > messages. Do you suppose these issues are triggered by the new environment > or did not exist in 4.1.5? > > [root@node1 ~]# glusterfs --version > glusterfs 4.1.5 > > On AWS using > [root@node1 ~]# hostnamectl >Static hostname: node1 > Icon name: computer-vm >Chassis: vm > Machine ID: b30d0f2110ac3807b210c19ede3ce88f >Boot ID: 52bb159a0aa94043a40e7c7651967bd9 > Virtualization: kvm > Operating System: CentOS Linux 7 (Core) >CPE OS Name: cpe:/o:centos:centos:7 > Kernel: Linux 3.10.0-862.3.2.el7.x86_64 > Architecture: x86-64 > > > > > On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa > wrote: > >> >> >> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev >> wrote: >> >>> Ok. I will try different options. >>> >>> This system is scheduled to go into production soon. What version would >>> you recommend to roll back to? >>> >> >> These are long standing issues. So, rolling back may not make these >> issues go away. Instead if you think performance is agreeable to you, >> please keep these xlators off in production. >> >> >>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev >>>> wrote: >>>> >>>>> Raghavendra, >>>>> >>>>> Thank for the suggestion. >>>>> >>>>> >>>>> I am suing >>>>> >>>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>>> glusterfs 5.0 >>>>> >>>>> On >>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>>> Icon name: computer-vm >>>>>Chassis: vm >>>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>>Boot ID: eed98992fdda4c88bdd459a89101766b >>>>> Virtualization: vmware >>>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>>> Architecture: x86-64 >>>>> >>>>> >>>>> I have configured the following options >>>>> >>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info >>>>> Volume Name: gv0 >>>>> Type: Replicate >>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>>> Options Reconfigured: >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> performance.quick-read: off >>>>> performance.parallel-readdir: off >>>>> performance.readdir-ahead: off >>>>> performance.write-behind: off >>>>> performance.read-ahead: off >>>>> performance.client-io-threads: off >>>>> nfs.disable: on >>>>> transport.address-family: inet >>>>> >>>>> I don't know if it is related, but I am seeing a lot of >>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>>> operation failed [No such device or address] >>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>> dispatch
Re: [Gluster-users] java application crushes while reading a zip file
Turning a single option on at a time still worked fine. I will keep trying. We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log messages. Do you suppose these issues are triggered by the new environment or did not exist in 4.1.5? [root@node1 ~]# glusterfs --version glusterfs 4.1.5 On AWS using [root@node1 ~]# hostnamectl Static hostname: node1 Icon name: computer-vm Chassis: vm Machine ID: b30d0f2110ac3807b210c19ede3ce88f Boot ID: 52bb159a0aa94043a40e7c7651967bd9 Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-862.3.2.el7.x86_64 Architecture: x86-64 On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa wrote: > > > On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev > wrote: > >> Ok. I will try different options. >> >> This system is scheduled to go into production soon. What version would >> you recommend to roll back to? >> > > These are long standing issues. So, rolling back may not make these issues > go away. Instead if you think performance is agreeable to you, please keep > these xlators off in production. > > >> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa < >> rgowd...@redhat.com> wrote: >> >>> >>> >>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev >>> wrote: >>> >>>> Raghavendra, >>>> >>>> Thank for the suggestion. >>>> >>>> >>>> I am suing >>>> >>>> [root@jl-fanexoss1p glusterfs]# gluster --version >>>> glusterfs 5.0 >>>> >>>> On >>>> [root@jl-fanexoss1p glusterfs]# hostnamectl >>>> Icon name: computer-vm >>>>Chassis: vm >>>> Machine ID: e44b8478ef7a467d98363614f4e50535 >>>>Boot ID: eed98992fdda4c88bdd459a89101766b >>>> Virtualization: vmware >>>> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >>>> Architecture: x86-64 >>>> >>>> >>>> I have configured the following options >>>> >>>> [root@jl-fanexoss1p glusterfs]# gluster volume info >>>> Volume Name: gv0 >>>> Type: Replicate >>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >>>> Options Reconfigured: >>>> performance.io-cache: off >>>> performance.stat-prefetch: off >>>> performance.quick-read: off >>>> performance.parallel-readdir: off >>>> performance.readdir-ahead: off >>>> performance.write-behind: off >>>> performance.read-ahead: off >>>> performance.client-io-threads: off >>>> nfs.disable: on >>>> transport.address-family: inet >>>> >>>> I don't know if it is related, but I am seeing a lot of >>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >>>> operation failed [No such device or address] >>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >>>> handler >>>> >>> >>> These msgs were introduced by patch [1]. To the best of my knowledge >>> they are benign. We'll be sending a patch to fix these msgs though. >>> >>> +Mohit Agrawal +Milind Changire >>> . Can you try to identify why we are seeing these >>> messages? If possible please send a patch to fix this. >>> >>> [1] >>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 >>> >>> >>>> And java.io exceptions trying to rename files. >>>> >>> >>> When you see the errors is it possible to collect, >>> * strace of the java application (strace -ff -v ...) >>> * fuse-dump of the glusterfs mount (use option --dump-fuse while >>> mounting)? >>> >>> I also need another favour fro
Re: [Gluster-users] java application crushes while reading a zip file
Ok. I will try different options. This system is scheduled to go into production soon. What version would you recommend to roll back to? On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa wrote: > > > On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev > wrote: > >> Raghavendra, >> >> Thank for the suggestion. >> >> >> I am suing >> >> [root@jl-fanexoss1p glusterfs]# gluster --version >> glusterfs 5.0 >> >> On >> [root@jl-fanexoss1p glusterfs]# hostnamectl >> Icon name: computer-vm >>Chassis: vm >> Machine ID: e44b8478ef7a467d98363614f4e50535 >>Boot ID: eed98992fdda4c88bdd459a89101766b >> Virtualization: vmware >> Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server >> Kernel: Linux 3.10.0-862.14.4.el7.x86_64 >> Architecture: x86-64 >> >> >> I have configured the following options >> >> [root@jl-fanexoss1p glusterfs]# gluster volume info >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 >> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 >> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 >> Options Reconfigured: >> performance.io-cache: off >> performance.stat-prefetch: off >> performance.quick-read: off >> performance.parallel-readdir: off >> performance.readdir-ahead: off >> performance.write-behind: off >> performance.read-ahead: off >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> >> I don't know if it is related, but I am seeing a lot of >> [2018-12-27 20:19:23.776080] W [MSGID: 114031] >> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote >> operation failed [No such device or address] >> [2018-12-27 20:19:47.735190] E [MSGID: 101191] >> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch >> handler >> > > These msgs were introduced by patch [1]. To the best of my knowledge they > are benign. We'll be sending a patch to fix these msgs though. > > +Mohit Agrawal +Milind Changire > . Can you try to identify why we are seeing these > messages? If possible please send a patch to fix this. > > [1] https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5 > > >> And java.io exceptions trying to rename files. >> > > When you see the errors is it possible to collect, > * strace of the java application (strace -ff -v ...) > * fuse-dump of the glusterfs mount (use option --dump-fuse while mounting)? > > I also need another favour from you. By trail and error, can you point out > which of the many performance xlators you've turned off is causing the > issue? > > The above two data-points will help us to fix the problem. > > >> Thank You, >> Dmitry >> >> >> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa >> wrote: >> >>> What version of glusterfs are you using? It might be either >>> * a stale metadata issue. >>> * inconsistent ctime issue. >>> >>> Can you try turning off all performance xlators? If the issue is 1, that >>> should help. >>> >>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev >>> wrote: >>> >>>> Attempted to set 'performance.read-ahead off` according to >>>> https://jira.apache.org/jira/browse/AMQ-7041 >>>> That did not help. >>>> >>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev >>>> wrote: >>>> >>>>> The core file generated by JVM suggests that it happens because the >>>>> file is changing while it is being read - >>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>>> The application reads in the zipfile and goes through the zip entries, >>>>> then reloads the file and goes the zip entries again. It does so 3 times. >>>>> The application never crushes on the 1st cycle but sometimes crushes on >>>>> the >>>>> 2nd or 3rd cycle. >>>>> The zip file is generated about 20 seconds prior to it being used and >>>>> is not updated or even used by any other application. I have never seen >>>>> this problem on a plain file system. >>>>> >>>>> I would appreciate any suggestions on how to go debugging this issue. >>>>> I can change the source code of the java application. >>>>> >>>>> Regards, >>>>> Dmitry >>>>> >>>>> >>>>> ___ >>>> Gluster-users mailing list >>>> Gluster-users@gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] java application crushes while reading a zip file
Raghavendra, So far so good. No problems with reading zip files or renaming files. I will check again tomorrow. I am still seeing these in the logs, however. [2018-12-28 01:01:17.301203] W [MSGID: 114031] [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 12-gv0-client-0: remote operation failed [No such device or address] [2018-12-28 01:01:20.218775] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 12-epoll: Failed to dispatch handler Regards, Dmitry On Thu, Dec 27, 2018 at 4:43 PM Dmitry Isakbayev wrote: > Raghavendra, > > Thank for the suggestion. > > > I am suing > > [root@jl-fanexoss1p glusterfs]# gluster --version > glusterfs 5.0 > > On > [root@jl-fanexoss1p glusterfs]# hostnamectl > Icon name: computer-vm >Chassis: vm > Machine ID: e44b8478ef7a467d98363614f4e50535 >Boot ID: eed98992fdda4c88bdd459a89101766b > Virtualization: vmware > Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) >CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server > Kernel: Linux 3.10.0-862.14.4.el7.x86_64 > Architecture: x86-64 > > > I have configured the following options > > [root@jl-fanexoss1p glusterfs]# gluster volume info > Volume Name: gv0 > Type: Replicate > Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 > Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 > Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 > Options Reconfigured: > performance.io-cache: off > performance.stat-prefetch: off > performance.quick-read: off > performance.parallel-readdir: off > performance.readdir-ahead: off > performance.write-behind: off > performance.read-ahead: off > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > > I don't know if it is related, but I am seeing a lot of > [2018-12-27 20:19:23.776080] W [MSGID: 114031] > [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote > operation failed [No such device or address] > [2018-12-27 20:19:47.735190] E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch > handler > > And java.io exceptions trying to rename files. > > Thank You, > Dmitry > > > On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa > wrote: > >> What version of glusterfs are you using? It might be either >> * a stale metadata issue. >> * inconsistent ctime issue. >> >> Can you try turning off all performance xlators? If the issue is 1, that >> should help. >> >> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev >> wrote: >> >>> Attempted to set 'performance.read-ahead off` according to >>> https://jira.apache.org/jira/browse/AMQ-7041 >>> That did not help. >>> >>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev >>> wrote: >>> >>>> The core file generated by JVM suggests that it happens because the >>>> file is changing while it is being read - >>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>>> The application reads in the zipfile and goes through the zip entries, >>>> then reloads the file and goes the zip entries again. It does so 3 times. >>>> The application never crushes on the 1st cycle but sometimes crushes on the >>>> 2nd or 3rd cycle. >>>> The zip file is generated about 20 seconds prior to it being used and >>>> is not updated or even used by any other application. I have never seen >>>> this problem on a plain file system. >>>> >>>> I would appreciate any suggestions on how to go debugging this issue. >>>> I can change the source code of the java application. >>>> >>>> Regards, >>>> Dmitry >>>> >>>> >>>> ___ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] java application crushes while reading a zip file
Raghavendra, Thank for the suggestion. I am suing [root@jl-fanexoss1p glusterfs]# gluster --version glusterfs 5.0 On [root@jl-fanexoss1p glusterfs]# hostnamectl Icon name: computer-vm Chassis: vm Machine ID: e44b8478ef7a467d98363614f4e50535 Boot ID: eed98992fdda4c88bdd459a89101766b Virtualization: vmware Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server Kernel: Linux 3.10.0-862.14.4.el7.x86_64 Architecture: x86-64 I have configured the following options [root@jl-fanexoss1p glusterfs]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0 Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0 Brick3: nxquorum1p.cspire.net:/data/brick1/gv0 Options Reconfigured: performance.io-cache: off performance.stat-prefetch: off performance.quick-read: off performance.parallel-readdir: off performance.readdir-ahead: off performance.write-behind: off performance.read-ahead: off performance.client-io-threads: off nfs.disable: on transport.address-family: inet I don't know if it is related, but I am seeing a lot of [2018-12-27 20:19:23.776080] W [MSGID: 114031] [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote operation failed [No such device or address] [2018-12-27 20:19:47.735190] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler And java.io exceptions trying to rename files. Thank You, Dmitry On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa wrote: > What version of glusterfs are you using? It might be either > * a stale metadata issue. > * inconsistent ctime issue. > > Can you try turning off all performance xlators? If the issue is 1, that > should help. > > On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev > wrote: > >> Attempted to set 'performance.read-ahead off` according to >> https://jira.apache.org/jira/browse/AMQ-7041 >> That did not help. >> >> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev >> wrote: >> >>> The core file generated by JVM suggests that it happens because the file >>> is changing while it is being read - >>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. >>> The application reads in the zipfile and goes through the zip entries, >>> then reloads the file and goes the zip entries again. It does so 3 times. >>> The application never crushes on the 1st cycle but sometimes crushes on the >>> 2nd or 3rd cycle. >>> The zip file is generated about 20 seconds prior to it being used and is >>> not updated or even used by any other application. I have never seen this >>> problem on a plain file system. >>> >>> I would appreciate any suggestions on how to go debugging this issue. I >>> can change the source code of the java application. >>> >>> Regards, >>> Dmitry >>> >>> >>> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] java application crushes while reading a zip file
Attempted to set 'performance.read-ahead off` according to https://jira.apache.org/jira/browse/AMQ-7041 That did not help. On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev wrote: > The core file generated by JVM suggests that it happens because the file > is changing while it is being read - > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. > The application reads in the zipfile and goes through the zip entries, > then reloads the file and goes the zip entries again. It does so 3 times. > The application never crushes on the 1st cycle but sometimes crushes on the > 2nd or 3rd cycle. > The zip file is generated about 20 seconds prior to it being used and is > not updated or even used by any other application. I have never seen this > problem on a plain file system. > > I would appreciate any suggestions on how to go debugging this issue. I > can change the source code of the java application. > > Regards, > Dmitry > > > ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] java application crushes while reading a zip file
The core file generated by JVM suggests that it happens because the file is changing while it is being read - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557. The application reads in the zipfile and goes through the zip entries, then reloads the file and goes the zip entries again. It does so 3 times. The application never crushes on the 1st cycle but sometimes crushes on the 2nd or 3rd cycle. The zip file is generated about 20 seconds prior to it being used and is not updated or even used by any other application. I have never seen this problem on a plain file system. I would appreciate any suggestions on how to go debugging this issue. I can change the source code of the java application. Regards, Dmitry ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users