Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-03 Thread Andras Pataki
I've tested  the 12.2.1 fuse client - and it also reproduces the problem 
unfortunately.  Investigating the code that accesses the file system, it 
looks like multiple processes from multiple nodes write to the same file 
concurrently, but to different byte ranges of it.  Unfortunately the 
problem happens some hours into the run of the code, so I can't really 
run the MDS or fuse with a very high debug level that long.  Well, 
perhaps fuse I could run with a higher debug level on the nodes in 
question if that helps.


Andras


On 11/03/2017 12:29 AM, Gregory Farnum wrote:

Either ought to work fine.

On Thu, Nov 2, 2017 at 4:58 PM Andras Pataki 
> 
wrote:


I'm planning to test the newer ceph-fuse tomorrow.  Would it be
better to stay with the Jewel 10.2.10 client, or would the 12.2.1
Luminous client be better (even though the back-end is Jewel for now)?


Andras



On 11/02/2017 05:54 PM, Gregory Farnum wrote:

Have you tested on the new ceph-fuse? This does sound vaguely
familiar and is an issue I'd generally expect to have the fix
backported for, once it was identified.

On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki
> wrote:

We've been running into a strange problem with Ceph using
ceph-fuse and
the filesystem. All the back end nodes are on 10.2.10, the
fuse clients
are on 10.2.7.

After some hours of runs, some processes get stuck waiting
for fuse like:

[root@worker1144 ~]# cat /proc/58193/stack
[] wait_answer_interruptible+0x91/0xe0 [fuse]
[] __fuse_request_send+0x253/0x2c0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_send_write+0xd6/0x110 [fuse]
[] fuse_perform_write+0x2f5/0x5a0 [fuse]
[] fuse_file_aio_write+0x2a1/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

The cluster is healthy (all OSDs up, no slow requests,
etc.).  More
details of my investigation efforts are in the bug report I
just submitted:
http://tracker.ceph.com/issues/22008

It looks like the fuse client is asking for some caps that it
never
thinks it receives from the MDS, so the thread waiting for
those caps on
behalf of the writing client never wakes up.  The restart of
the MDS
fixes the problem (since ceph-fuse re-negotiates caps).

Any ideas/suggestions?

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Gregory Farnum
Either ought to work fine.

On Thu, Nov 2, 2017 at 4:58 PM Andras Pataki 
wrote:

> I'm planning to test the newer ceph-fuse tomorrow.  Would it be better to
> stay with the Jewel 10.2.10 client, or would the 12.2.1 Luminous client be
> better (even though the back-end is Jewel for now)?
>
>
> Andras
>
>
>
> On 11/02/2017 05:54 PM, Gregory Farnum wrote:
>
> Have you tested on the new ceph-fuse? This does sound vaguely familiar and
> is an issue I'd generally expect to have the fix backported for, once it
> was identified.
>
> On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> We've been running into a strange problem with Ceph using ceph-fuse and
>> the filesystem. All the back end nodes are on 10.2.10, the fuse clients
>> are on 10.2.7.
>>
>> After some hours of runs, some processes get stuck waiting for fuse like:
>>
>> [root@worker1144 ~]# cat /proc/58193/stack
>> [] wait_answer_interruptible+0x91/0xe0 [fuse]
>> [] __fuse_request_send+0x253/0x2c0 [fuse]
>> [] fuse_request_send+0x12/0x20 [fuse]
>> [] fuse_send_write+0xd6/0x110 [fuse]
>> [] fuse_perform_write+0x2f5/0x5a0 [fuse]
>> [] fuse_file_aio_write+0x2a1/0x340 [fuse]
>> [] do_sync_write+0x8d/0xd0
>> [] vfs_write+0xbd/0x1e0
>> [] SyS_write+0x7f/0xe0
>> [] system_call_fastpath+0x16/0x1b
>> [] 0x
>>
>> The cluster is healthy (all OSDs up, no slow requests, etc.).  More
>> details of my investigation efforts are in the bug report I just
>> submitted:
>>  http://tracker.ceph.com/issues/22008
>>
>> It looks like the fuse client is asking for some caps that it never
>> thinks it receives from the MDS, so the thread waiting for those caps on
>> behalf of the writing client never wakes up.  The restart of the MDS
>> fixes the problem (since ceph-fuse re-negotiates caps).
>>
>> Any ideas/suggestions?
>>
>> Andras
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Andras Pataki
I'm planning to test the newer ceph-fuse tomorrow.  Would it be better 
to stay with the Jewel 10.2.10 client, or would the 12.2.1 Luminous 
client be better (even though the back-end is Jewel for now)?


Andras


On 11/02/2017 05:54 PM, Gregory Farnum wrote:
Have you tested on the new ceph-fuse? This does sound vaguely familiar 
and is an issue I'd generally expect to have the fix backported for, 
once it was identified.


On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki 
> 
wrote:


We've been running into a strange problem with Ceph using
ceph-fuse and
the filesystem. All the back end nodes are on 10.2.10, the fuse
clients
are on 10.2.7.

After some hours of runs, some processes get stuck waiting for
fuse like:

[root@worker1144 ~]# cat /proc/58193/stack
[] wait_answer_interruptible+0x91/0xe0 [fuse]
[] __fuse_request_send+0x253/0x2c0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_send_write+0xd6/0x110 [fuse]
[] fuse_perform_write+0x2f5/0x5a0 [fuse]
[] fuse_file_aio_write+0x2a1/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

The cluster is healthy (all OSDs up, no slow requests, etc.). More
details of my investigation efforts are in the bug report I just
submitted:
http://tracker.ceph.com/issues/22008

It looks like the fuse client is asking for some caps that it never
thinks it receives from the MDS, so the thread waiting for those
caps on
behalf of the writing client never wakes up.  The restart of the MDS
fixes the problem (since ceph-fuse re-negotiates caps).

Any ideas/suggestions?

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Gregory Farnum
Have you tested on the new ceph-fuse? This does sound vaguely familiar and
is an issue I'd generally expect to have the fix backported for, once it
was identified.

On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki 
wrote:

> We've been running into a strange problem with Ceph using ceph-fuse and
> the filesystem. All the back end nodes are on 10.2.10, the fuse clients
> are on 10.2.7.
>
> After some hours of runs, some processes get stuck waiting for fuse like:
>
> [root@worker1144 ~]# cat /proc/58193/stack
> [] wait_answer_interruptible+0x91/0xe0 [fuse]
> [] __fuse_request_send+0x253/0x2c0 [fuse]
> [] fuse_request_send+0x12/0x20 [fuse]
> [] fuse_send_write+0xd6/0x110 [fuse]
> [] fuse_perform_write+0x2f5/0x5a0 [fuse]
> [] fuse_file_aio_write+0x2a1/0x340 [fuse]
> [] do_sync_write+0x8d/0xd0
> [] vfs_write+0xbd/0x1e0
> [] SyS_write+0x7f/0xe0
> [] system_call_fastpath+0x16/0x1b
> [] 0x
>
> The cluster is healthy (all OSDs up, no slow requests, etc.).  More
> details of my investigation efforts are in the bug report I just submitted:
>  http://tracker.ceph.com/issues/22008
>
> It looks like the fuse client is asking for some caps that it never
> thinks it receives from the MDS, so the thread waiting for those caps on
> behalf of the writing client never wakes up.  The restart of the MDS
> fixes the problem (since ceph-fuse re-negotiates caps).
>
> Any ideas/suggestions?
>
> Andras
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com