Re: [Gluster-users] java application crushes while reading a zip file

2019-01-28 Thread Dmitry Isakbayev
Amar,

Thank you for helping me troubleshoot the issues.  I don't have the
resources to test the software at this point, but I will keep it in mind.

Regards,
Dmitry


On Tue, Jan 22, 2019 at 1:02 AM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Dmitry,
>
> Thanks for the detailed updates on this thread. Let us know how your
> 'production' setup is running. For much smoother next upgrade, we request
> you to help out with some early testing of glusterfs-6 RC builds which are
> expected to be out by Feb 1st week.
>
> Also, if it is possible for you to automate the tests, it would be great
> to have it in our regression, so we can always be sure your setup would
> never break in future releases.
>
> Regards,
> Amar
>
> On Mon, Jan 7, 2019 at 11:42 PM Dmitry Isakbayev 
> wrote:
>
>> This system is going into production.  I will try to replicate this
>> problem on the next installation.
>>
>> On Wed, Jan 2, 2019 at 9:25 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev 
>>> wrote:
>>>
>>>> Still no JVM crushes.  Is it possible that running glusterfs with
>>>> performance options turned off for a couple of days cleared out the "stale
>>>> metadata issue"?
>>>>
>>>
>>> restarting these options, would've cleared the existing cache and hence
>>> previous stale metadata would've been cleared. Hitting stale metadata
>>> again  depends on races. That might be the reason you are still not seeing
>>> the issue. Can you try with enabling all perf xlators (default
>>> configuration)?
>>>
>>>
>>>>
>>>> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev 
>>>> wrote:
>>>>
>>>>> The software ran with all of the options turned off over the weekend
>>>>> without any problems.
>>>>> I will try to collect the debug info for you.  I have re-enabled the 3
>>>>> three options, but yet to see the problem reoccurring.
>>>>>
>>>>>
>>>>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <
>>>>> rgowd...@redhat.com> wrote:
>>>>>
>>>>>> Thanks Dmitry. Can you provide the following debug info I asked
>>>>>> earlier:
>>>>>>
>>>>>> * strace -ff -v ... of java application
>>>>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse
>>>>>> while mounting).
>>>>>>
>>>>>> regards,
>>>>>> Raghavendra
>>>>>>
>>>>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
>>>>>> wrote:
>>>>>>
>>>>>>> These 3 options seem to trigger both (reading zip file and renaming
>>>>>>> files) problems.
>>>>>>>
>>>>>>> Options Reconfigured:
>>>>>>> performance.io-cache: off
>>>>>>> performance.stat-prefetch: off
>>>>>>> performance.quick-read: off
>>>>>>> performance.parallel-readdir: off
>>>>>>> *performance.readdir-ahead: on*
>>>>>>> *performance.write-behind: on*
>>>>>>> *performance.read-ahead: on*
>>>>>>> performance.client-io-threads: off
>>>>>>> nfs.disable: on
>>>>>>> transport.address-family: inet
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Turning a single option on at a time still worked fine.  I will
>>>>>>>> keep trying.
>>>>>>>>
>>>>>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or
>>>>>>>> log messages.  Do you suppose these issues are triggered by the new
>>>>>>>> environment or did not exist in 4.1.5?
>>>>>>>>
>>>>>>>> [root@node1 ~]# glusterfs --version
>>>>>>>> glusterfs 4.1.5
>>>>>>>>
>>>>>>>> On AWS using
>>>>>>>> [root@node1 ~]# hostnamectl
>>>>>>>>Static hostname: node1
>>>>>>>>  Icon name: computer-vm
>>>>>>>>Chassis: vm
>>&

Re: [Gluster-users] [External] Re: A broken file that can not be deleted

2019-01-10 Thread Dmitry Isakbayev
Nithyya, Raghavendra, Davide

Thank you for your help.


*>how the permissions are displayed for the file on the servers?*
Trying to answer this question is what fixed the problem.  It looked just
fine on all 3 servers.  And it looks like running "ls" on the servers fixed
it on the clients.  I had to repeat it on all 3 servers.
It fixed file permissions on 2 clients and made the file show up on the 3rd
client.

Even though it fixed the file permissions and I could now view contents of
the file, the software was still having issues with renaming
".download_suspensions.memo.writing"  to  ".download_suspensions.memo"
When I tried to replace the file manually, I got
$ mv .download_suspensions.memo.writing .download_suspensions.memo
mv: ‘.download_suspensions.memo.writing’ and ‘.download_suspensions.memo’
are the same file

I ended up removing both files and having the software rebuild them.

>
*Wondering whether its a case of split brain.*Very possible.  All 3 servers
were rebooted.  It brought down linux cluster running on the same 3 servers
as well.



> On Thu, Jan 10, 2019 at 10:00 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Wed, Jan 9, 2019 at 7:48 PM Dmitry Isakbayev 
>> wrote:
>>
>>> I am seeing a broken file that exists on 2 out of 3 nodes.
>>>
>>
>> Wondering whether its a case of split brain.
>>
>>
>>> The application trying to use the file throws file permissions error.
>>> ls, rm, mv, touch all throw "Input/output error"
>>>
>>> $ ls -la
>>> ls: cannot access .download_suspensions.memo: Input/output error
>>> drwxrwxr-x. 2 ossadmin ossadmin  4096 Jan  9 08:06 .
>>> drwxrwxr-x. 5 ossadmin ossadmin  4096 Jan  3 11:36 ..
>>> -?? ? ????
>>> .download_suspensions.memo
>>>
>>> $ rm ".download_suspensions.memo"
>>> rm: cannot remove ‘.download_suspensions.memo’: Input/output error
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Davide Obbi
> Senior System Administrator
>
> Booking.com B.V.
> Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
> Direct +31207031558
> [image: Booking.com] <https://www.booking.com/>
> Empowering people to experience the world since 1996
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] A broken file that can not be deleted

2019-01-09 Thread Dmitry Isakbayev
I am seeing a broken file that exists on 2 out of 3 nodes.  The application
trying to use the file throws file permissions error.  ls, rm, mv, touch
all throw "Input/output error"

$ ls -la
ls: cannot access .download_suspensions.memo: Input/output error
drwxrwxr-x. 2 ossadmin ossadmin  4096 Jan  9 08:06 .
drwxrwxr-x. 5 ossadmin ossadmin  4096 Jan  3 11:36 ..
-?? ? ????
.download_suspensions.memo

$ rm ".download_suspensions.memo"
rm: cannot remove ‘.download_suspensions.memo’: Input/output error
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] java application crushes while reading a zip file

2019-01-07 Thread Dmitry Isakbayev
This system is going into production.  I will try to replicate this problem
on the next installation.

On Wed, Jan 2, 2019 at 9:25 PM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Jan 2, 2019 at 9:59 PM Dmitry Isakbayev  wrote:
>
>> Still no JVM crushes.  Is it possible that running glusterfs with
>> performance options turned off for a couple of days cleared out the "stale
>> metadata issue"?
>>
>
> restarting these options, would've cleared the existing cache and hence
> previous stale metadata would've been cleared. Hitting stale metadata
> again  depends on races. That might be the reason you are still not seeing
> the issue. Can you try with enabling all perf xlators (default
> configuration)?
>
>
>>
>> On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev 
>> wrote:
>>
>>> The software ran with all of the options turned off over the weekend
>>> without any problems.
>>> I will try to collect the debug info for you.  I have re-enabled the 3
>>> three options, but yet to see the problem reoccurring.
>>>
>>>
>>> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>>>>
>>>> * strace -ff -v ... of java application
>>>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
>>>> mounting).
>>>>
>>>> regards,
>>>> Raghavendra
>>>>
>>>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
>>>> wrote:
>>>>
>>>>> These 3 options seem to trigger both (reading zip file and renaming
>>>>> files) problems.
>>>>>
>>>>> Options Reconfigured:
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> performance.quick-read: off
>>>>> performance.parallel-readdir: off
>>>>> *performance.readdir-ahead: on*
>>>>> *performance.write-behind: on*
>>>>> *performance.read-ahead: on*
>>>>> performance.client-io-threads: off
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>>
>>>>>
>>>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
>>>>> wrote:
>>>>>
>>>>>> Turning a single option on at a time still worked fine.  I will keep
>>>>>> trying.
>>>>>>
>>>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
>>>>>> messages.  Do you suppose these issues are triggered by the new 
>>>>>> environment
>>>>>> or did not exist in 4.1.5?
>>>>>>
>>>>>> [root@node1 ~]# glusterfs --version
>>>>>> glusterfs 4.1.5
>>>>>>
>>>>>> On AWS using
>>>>>> [root@node1 ~]# hostnamectl
>>>>>>Static hostname: node1
>>>>>>  Icon name: computer-vm
>>>>>>Chassis: vm
>>>>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>>>>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>>>>> Virtualization: kvm
>>>>>>   Operating System: CentOS Linux 7 (Core)
>>>>>>CPE OS Name: cpe:/o:centos:centos:7
>>>>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>>>>>   Architecture: x86-64
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
>>>>>> rgowd...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ok. I will try different options.
>>>>>>>>
>>>>>>>> This system is scheduled to go into production soon.  What version
>>>>>>>> would you recommend to roll back to?
>>>>>>>>
>>>>>>>
>>>>>>> These are long standing issues. So, rolling back may not make these
>>>>>>> issues go away. Instead if you think performance is agreeable to you,
>>>>>>> please keep these xlators off in production.
&g

Re: [Gluster-users] java application crushes while reading a zip file

2019-01-02 Thread Dmitry Isakbayev
Still no JVM crushes.  Is it possible that running glusterfs with
performance options turned off for a couple of days cleared out the "stale
metadata issue"?


On Mon, Dec 31, 2018 at 1:38 PM Dmitry Isakbayev  wrote:

> The software ran with all of the options turned off over the weekend
> without any problems.
> I will try to collect the debug info for you.  I have re-enabled the 3
> three options, but yet to see the problem reoccurring.
>
>
> On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa 
> wrote:
>
>> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>>
>> * strace -ff -v ... of java application
>> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
>> mounting).
>>
>> regards,
>> Raghavendra
>>
>> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
>> wrote:
>>
>>> These 3 options seem to trigger both (reading zip file and renaming
>>> files) problems.
>>>
>>> Options Reconfigured:
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.quick-read: off
>>> performance.parallel-readdir: off
>>> *performance.readdir-ahead: on*
>>> *performance.write-behind: on*
>>> *performance.read-ahead: on*
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>>
>>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
>>> wrote:
>>>
>>>> Turning a single option on at a time still worked fine.  I will keep
>>>> trying.
>>>>
>>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
>>>> messages.  Do you suppose these issues are triggered by the new environment
>>>> or did not exist in 4.1.5?
>>>>
>>>> [root@node1 ~]# glusterfs --version
>>>> glusterfs 4.1.5
>>>>
>>>> On AWS using
>>>> [root@node1 ~]# hostnamectl
>>>>Static hostname: node1
>>>>  Icon name: computer-vm
>>>>Chassis: vm
>>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>>> Virtualization: kvm
>>>>   Operating System: CentOS Linux 7 (Core)
>>>>CPE OS Name: cpe:/o:centos:centos:7
>>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>>>   Architecture: x86-64
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
>>>> rgowd...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
>>>>> wrote:
>>>>>
>>>>>> Ok. I will try different options.
>>>>>>
>>>>>> This system is scheduled to go into production soon.  What version
>>>>>> would you recommend to roll back to?
>>>>>>
>>>>>
>>>>> These are long standing issues. So, rolling back may not make these
>>>>> issues go away. Instead if you think performance is agreeable to you,
>>>>> please keep these xlators off in production.
>>>>>
>>>>>
>>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>>>>> rgowd...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Raghavendra,
>>>>>>>>
>>>>>>>> Thank  for the suggestion.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am suing
>>>>>>>>
>>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version
>>>>>>>> glusterfs 5.0
>>>>>>>>
>>>>>>>> On
>>>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl
>>>>>>>>  Icon name: computer-vm
>>>>>>>>Chassis: vm
>>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>>>>>Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>>>>> Virtualization: vmware
>>>>>>>>   Operating Sy

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-31 Thread Dmitry Isakbayev
The software ran with all of the options turned off over the weekend
without any problems.
I will try to collect the debug info for you.  I have re-enabled the 3
three options, but yet to see the problem reoccurring.


On Sat, Dec 29, 2018 at 6:46 PM Raghavendra Gowdappa 
wrote:

> Thanks Dmitry. Can you provide the following debug info I asked earlier:
>
> * strace -ff -v ... of java application
> * dump of the I/O traffic seen by the mountpoint (use --dump-fuse while
> mounting).
>
> regards,
> Raghavendra
>
> On Sat, Dec 29, 2018 at 2:08 AM Dmitry Isakbayev 
> wrote:
>
>> These 3 options seem to trigger both (reading zip file and renaming
>> files) problems.
>>
>> Options Reconfigured:
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> *performance.readdir-ahead: on*
>> *performance.write-behind: on*
>> *performance.read-ahead: on*
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>>
>>
>> On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev 
>> wrote:
>>
>>> Turning a single option on at a time still worked fine.  I will keep
>>> trying.
>>>
>>> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
>>> messages.  Do you suppose these issues are triggered by the new environment
>>> or did not exist in 4.1.5?
>>>
>>> [root@node1 ~]# glusterfs --version
>>> glusterfs 4.1.5
>>>
>>> On AWS using
>>> [root@node1 ~]# hostnamectl
>>>Static hostname: node1
>>>  Icon name: computer-vm
>>>Chassis: vm
>>> Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>>>Boot ID: 52bb159a0aa94043a40e7c7651967bd9
>>> Virtualization: kvm
>>>   Operating System: CentOS Linux 7 (Core)
>>>CPE OS Name: cpe:/o:centos:centos:7
>>> Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>>>   Architecture: x86-64
>>>
>>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
>>>> wrote:
>>>>
>>>>> Ok. I will try different options.
>>>>>
>>>>> This system is scheduled to go into production soon.  What version
>>>>> would you recommend to roll back to?
>>>>>
>>>>
>>>> These are long standing issues. So, rolling back may not make these
>>>> issues go away. Instead if you think performance is agreeable to you,
>>>> please keep these xlators off in production.
>>>>
>>>>
>>>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>>>> rgowd...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
>>>>>> wrote:
>>>>>>
>>>>>>> Raghavendra,
>>>>>>>
>>>>>>> Thank  for the suggestion.
>>>>>>>
>>>>>>>
>>>>>>> I am suing
>>>>>>>
>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version
>>>>>>> glusterfs 5.0
>>>>>>>
>>>>>>> On
>>>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl
>>>>>>>  Icon name: computer-vm
>>>>>>>Chassis: vm
>>>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>>>>Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>>>> Virtualization: vmware
>>>>>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>>>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>>>>   Architecture: x86-64
>>>>>>>
>>>>>>>
>>>>>>> I have configured the following options
>>>>>>>
>>>>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info
>>>>>>> Volume Name: gv0
>>>>>>> Type: Replicate
>>>>>>> Volume ID: 5ffbda09-c5e2-4abc-b

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-28 Thread Dmitry Isakbayev
These 3 options seem to trigger both (reading zip file and renaming files)
problems.

Options Reconfigured:
performance.io-cache: off
performance.stat-prefetch: off
performance.quick-read: off
performance.parallel-readdir: off
*performance.readdir-ahead: on*
*performance.write-behind: on*
*performance.read-ahead: on*
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet


On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev  wrote:

> Turning a single option on at a time still worked fine.  I will keep
> trying.
>
> We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
> messages.  Do you suppose these issues are triggered by the new environment
> or did not exist in 4.1.5?
>
> [root@node1 ~]# glusterfs --version
> glusterfs 4.1.5
>
> On AWS using
> [root@node1 ~]# hostnamectl
>Static hostname: node1
>  Icon name: computer-vm
>Chassis: vm
> Machine ID: b30d0f2110ac3807b210c19ede3ce88f
>Boot ID: 52bb159a0aa94043a40e7c7651967bd9
> Virtualization: kvm
>   Operating System: CentOS Linux 7 (Core)
>CPE OS Name: cpe:/o:centos:centos:7
> Kernel: Linux 3.10.0-862.3.2.el7.x86_64
>   Architecture: x86-64
>
>
>
>
> On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
>> wrote:
>>
>>> Ok. I will try different options.
>>>
>>> This system is scheduled to go into production soon.  What version would
>>> you recommend to roll back to?
>>>
>>
>> These are long standing issues. So, rolling back may not make these
>> issues go away. Instead if you think performance is agreeable to you,
>> please keep these xlators off in production.
>>
>>
>>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
>>>> wrote:
>>>>
>>>>> Raghavendra,
>>>>>
>>>>> Thank  for the suggestion.
>>>>>
>>>>>
>>>>> I am suing
>>>>>
>>>>> [root@jl-fanexoss1p glusterfs]# gluster --version
>>>>> glusterfs 5.0
>>>>>
>>>>> On
>>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl
>>>>>  Icon name: computer-vm
>>>>>Chassis: vm
>>>>> Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>>Boot ID: eed98992fdda4c88bdd459a89101766b
>>>>> Virtualization: vmware
>>>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>>   Architecture: x86-64
>>>>>
>>>>>
>>>>> I have configured the following options
>>>>>
>>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info
>>>>> Volume Name: gv0
>>>>> Type: Replicate
>>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>>> Options Reconfigured:
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> performance.quick-read: off
>>>>> performance.parallel-readdir: off
>>>>> performance.readdir-ahead: off
>>>>> performance.write-behind: off
>>>>> performance.read-ahead: off
>>>>> performance.client-io-threads: off
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>>
>>>>> I don't know if it is related, but I am seeing a lot of
>>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
>>>>> operation failed [No such device or address]
>>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to 
>>>>> dispatch

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-28 Thread Dmitry Isakbayev
Turning a single option on at a time still worked fine.  I will keep trying.

We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log
messages.  Do you suppose these issues are triggered by the new environment
or did not exist in 4.1.5?

[root@node1 ~]# glusterfs --version
glusterfs 4.1.5

On AWS using
[root@node1 ~]# hostnamectl
   Static hostname: node1
 Icon name: computer-vm
   Chassis: vm
Machine ID: b30d0f2110ac3807b210c19ede3ce88f
   Boot ID: 52bb159a0aa94043a40e7c7651967bd9
Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
   CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-862.3.2.el7.x86_64
  Architecture: x86-64




On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev 
> wrote:
>
>> Ok. I will try different options.
>>
>> This system is scheduled to go into production soon.  What version would
>> you recommend to roll back to?
>>
>
> These are long standing issues. So, rolling back may not make these issues
> go away. Instead if you think performance is agreeable to you, please keep
> these xlators off in production.
>
>
>> On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
>>> wrote:
>>>
>>>> Raghavendra,
>>>>
>>>> Thank  for the suggestion.
>>>>
>>>>
>>>> I am suing
>>>>
>>>> [root@jl-fanexoss1p glusterfs]# gluster --version
>>>> glusterfs 5.0
>>>>
>>>> On
>>>> [root@jl-fanexoss1p glusterfs]# hostnamectl
>>>>  Icon name: computer-vm
>>>>Chassis: vm
>>>> Machine ID: e44b8478ef7a467d98363614f4e50535
>>>>Boot ID: eed98992fdda4c88bdd459a89101766b
>>>> Virtualization: vmware
>>>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
>>>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>>>   Architecture: x86-64
>>>>
>>>>
>>>> I have configured the following options
>>>>
>>>> [root@jl-fanexoss1p glusterfs]# gluster volume info
>>>> Volume Name: gv0
>>>> Type: Replicate
>>>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>>>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>>>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>>>> Options Reconfigured:
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> performance.quick-read: off
>>>> performance.parallel-readdir: off
>>>> performance.readdir-ahead: off
>>>> performance.write-behind: off
>>>> performance.read-ahead: off
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>>
>>>> I don't know if it is related, but I am seeing a lot of
>>>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>>>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
>>>> operation failed [No such device or address]
>>>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>>> handler
>>>>
>>>
>>> These msgs were introduced by patch [1]. To the best of my knowledge
>>> they are benign. We'll be sending a patch to fix these msgs though.
>>>
>>> +Mohit Agrawal  +Milind Changire
>>>  . Can you try to identify why we are seeing these
>>> messages? If possible please send a patch to fix this.
>>>
>>> [1]
>>> https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>>>
>>>
>>>> And java.io exceptions trying to rename files.
>>>>
>>>
>>> When you see the errors is it possible to collect,
>>> * strace of the java application (strace -ff -v ...)
>>> * fuse-dump of the glusterfs mount (use option --dump-fuse while
>>> mounting)?
>>>
>>> I also need another favour fro

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-28 Thread Dmitry Isakbayev
Ok. I will try different options.

This system is scheduled to go into production soon.  What version would
you recommend to roll back to?

On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev 
> wrote:
>
>> Raghavendra,
>>
>> Thank  for the suggestion.
>>
>>
>> I am suing
>>
>> [root@jl-fanexoss1p glusterfs]# gluster --version
>> glusterfs 5.0
>>
>> On
>> [root@jl-fanexoss1p glusterfs]# hostnamectl
>>  Icon name: computer-vm
>>Chassis: vm
>> Machine ID: e44b8478ef7a467d98363614f4e50535
>>Boot ID: eed98992fdda4c88bdd459a89101766b
>> Virtualization: vmware
>>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
>> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>>   Architecture: x86-64
>>
>>
>> I have configured the following options
>>
>> [root@jl-fanexoss1p glusterfs]# gluster volume info
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
>> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
>> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
>> Options Reconfigured:
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> performance.readdir-ahead: off
>> performance.write-behind: off
>> performance.read-ahead: off
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>>
>> I don't know if it is related, but I am seeing a lot of
>> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
>> operation failed [No such device or address]
>> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler
>>
>
> These msgs were introduced by patch [1]. To the best of my knowledge they
> are benign. We'll be sending a patch to fix these msgs though.
>
> +Mohit Agrawal  +Milind Changire
>  . Can you try to identify why we are seeing these
> messages? If possible please send a patch to fix this.
>
> [1] https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5
>
>
>> And java.io exceptions trying to rename files.
>>
>
> When you see the errors is it possible to collect,
> * strace of the java application (strace -ff -v ...)
> * fuse-dump of the glusterfs mount (use option --dump-fuse while mounting)?
>
> I also need another favour from you. By trail and error, can you point out
> which of the many performance xlators you've turned off is causing the
> issue?
>
> The above two data-points will help us to fix the problem.
>
>
>> Thank You,
>> Dmitry
>>
>>
>> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa 
>> wrote:
>>
>>> What version of glusterfs are you using? It might be either
>>> * a stale metadata issue.
>>> * inconsistent ctime issue.
>>>
>>> Can you try turning off all performance xlators? If the issue is 1, that
>>> should help.
>>>
>>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev 
>>> wrote:
>>>
>>>> Attempted to set 'performance.read-ahead off` according to
>>>> https://jira.apache.org/jira/browse/AMQ-7041
>>>> That did not help.
>>>>
>>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev 
>>>> wrote:
>>>>
>>>>> The core file generated by JVM suggests that it happens because the
>>>>> file is changing while it is being read -
>>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>>> The application reads in the zipfile and goes through the zip entries,
>>>>> then reloads the file and goes the zip entries again.  It does so 3 times.
>>>>> The application never crushes on the 1st cycle but sometimes crushes on 
>>>>> the
>>>>> 2nd or 3rd cycle.
>>>>> The zip file is generated about 20 seconds prior to it being used and
>>>>> is not updated or even used by any other application.  I have never seen
>>>>> this problem on a plain file system.
>>>>>
>>>>> I would appreciate any suggestions on how to go debugging this issue.
>>>>> I can change the source code of the java application.
>>>>>
>>>>> Regards,
>>>>> Dmitry
>>>>>
>>>>>
>>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-27 Thread Dmitry Isakbayev
Raghavendra,

So far so good.  No problems with reading zip files or renaming files.  I
will check again tomorrow.

I am still seeing these in the logs, however.
[2018-12-28 01:01:17.301203] W [MSGID: 114031]
[client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 12-gv0-client-0: remote
operation failed [No such device or address]
[2018-12-28 01:01:20.218775] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 12-epoll: Failed to
dispatch handler

Regards,
Dmitry


On Thu, Dec 27, 2018 at 4:43 PM Dmitry Isakbayev  wrote:

> Raghavendra,
>
> Thank  for the suggestion.
>
>
> I am suing
>
> [root@jl-fanexoss1p glusterfs]# gluster --version
> glusterfs 5.0
>
> On
> [root@jl-fanexoss1p glusterfs]# hostnamectl
>  Icon name: computer-vm
>Chassis: vm
> Machine ID: e44b8478ef7a467d98363614f4e50535
>Boot ID: eed98992fdda4c88bdd459a89101766b
> Virtualization: vmware
>   Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
>CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
> Kernel: Linux 3.10.0-862.14.4.el7.x86_64
>   Architecture: x86-64
>
>
> I have configured the following options
>
> [root@jl-fanexoss1p glusterfs]# gluster volume info
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
> Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
> Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
> Options Reconfigured:
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.parallel-readdir: off
> performance.readdir-ahead: off
> performance.write-behind: off
> performance.read-ahead: off
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
> I don't know if it is related, but I am seeing a lot of
> [2018-12-27 20:19:23.776080] W [MSGID: 114031]
> [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
> operation failed [No such device or address]
> [2018-12-27 20:19:47.735190] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler
>
> And java.io exceptions trying to rename files.
>
> Thank You,
> Dmitry
>
>
> On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa 
> wrote:
>
>> What version of glusterfs are you using? It might be either
>> * a stale metadata issue.
>> * inconsistent ctime issue.
>>
>> Can you try turning off all performance xlators? If the issue is 1, that
>> should help.
>>
>> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev 
>> wrote:
>>
>>> Attempted to set 'performance.read-ahead off` according to
>>> https://jira.apache.org/jira/browse/AMQ-7041
>>> That did not help.
>>>
>>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev 
>>> wrote:
>>>
>>>> The core file generated by JVM suggests that it happens because the
>>>> file is changing while it is being read -
>>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>>> The application reads in the zipfile and goes through the zip entries,
>>>> then reloads the file and goes the zip entries again.  It does so 3 times.
>>>> The application never crushes on the 1st cycle but sometimes crushes on the
>>>> 2nd or 3rd cycle.
>>>> The zip file is generated about 20 seconds prior to it being used and
>>>> is not updated or even used by any other application.  I have never seen
>>>> this problem on a plain file system.
>>>>
>>>> I would appreciate any suggestions on how to go debugging this issue.
>>>> I can change the source code of the java application.
>>>>
>>>> Regards,
>>>> Dmitry
>>>>
>>>>
>>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-27 Thread Dmitry Isakbayev
Raghavendra,

Thank  for the suggestion.


I am suing

[root@jl-fanexoss1p glusterfs]# gluster --version
glusterfs 5.0

On
[root@jl-fanexoss1p glusterfs]# hostnamectl
 Icon name: computer-vm
   Chassis: vm
Machine ID: e44b8478ef7a467d98363614f4e50535
   Boot ID: eed98992fdda4c88bdd459a89101766b
Virtualization: vmware
  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
   CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
Kernel: Linux 3.10.0-862.14.4.el7.x86_64
  Architecture: x86-64


I have configured the following options

[root@jl-fanexoss1p glusterfs]# gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
Options Reconfigured:
performance.io-cache: off
performance.stat-prefetch: off
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet

I don't know if it is related, but I am seeing a lot of
[2018-12-27 20:19:23.776080] W [MSGID: 114031]
[client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote
operation failed [No such device or address]
[2018-12-27 20:19:47.735190] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
handler

And java.io exceptions trying to rename files.

Thank You,
Dmitry


On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa 
wrote:

> What version of glusterfs are you using? It might be either
> * a stale metadata issue.
> * inconsistent ctime issue.
>
> Can you try turning off all performance xlators? If the issue is 1, that
> should help.
>
> On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev 
> wrote:
>
>> Attempted to set 'performance.read-ahead off` according to
>> https://jira.apache.org/jira/browse/AMQ-7041
>> That did not help.
>>
>> On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev 
>> wrote:
>>
>>> The core file generated by JVM suggests that it happens because the file
>>> is changing while it is being read -
>>> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
>>> The application reads in the zipfile and goes through the zip entries,
>>> then reloads the file and goes the zip entries again.  It does so 3 times.
>>> The application never crushes on the 1st cycle but sometimes crushes on the
>>> 2nd or 3rd cycle.
>>> The zip file is generated about 20 seconds prior to it being used and is
>>> not updated or even used by any other application.  I have never seen this
>>> problem on a plain file system.
>>>
>>> I would appreciate any suggestions on how to go debugging this issue.  I
>>> can change the source code of the java application.
>>>
>>> Regards,
>>> Dmitry
>>>
>>>
>>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] java application crushes while reading a zip file

2018-12-27 Thread Dmitry Isakbayev
Attempted to set 'performance.read-ahead off` according to
https://jira.apache.org/jira/browse/AMQ-7041
That did not help.

On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev  wrote:

> The core file generated by JVM suggests that it happens because the file
> is changing while it is being read -
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
> The application reads in the zipfile and goes through the zip entries,
> then reloads the file and goes the zip entries again.  It does so 3 times.
> The application never crushes on the 1st cycle but sometimes crushes on the
> 2nd or 3rd cycle.
> The zip file is generated about 20 seconds prior to it being used and is
> not updated or even used by any other application.  I have never seen this
> problem on a plain file system.
>
> I would appreciate any suggestions on how to go debugging this issue.  I
> can change the source code of the java application.
>
> Regards,
> Dmitry
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] java application crushes while reading a zip file

2018-12-24 Thread Dmitry Isakbayev
The core file generated by JVM suggests that it happens because the file is
changing while it is being read -
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
The application reads in the zipfile and goes through the zip entries, then
reloads the file and goes the zip entries again.  It does so 3 times.  The
application never crushes on the 1st cycle but sometimes crushes on the 2nd
or 3rd cycle.
The zip file is generated about 20 seconds prior to it being used and is
not updated or even used by any other application.  I have never seen this
problem on a plain file system.

I would appreciate any suggestions on how to go debugging this issue.  I
can change the source code of the java application.

Regards,
Dmitry
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users