subject:"Re\: \[ceph\-users\] ceph\-fuse segfaults \( jewel 10.2.2\)"

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-19 Thread Yan, Zheng

the other availability zones: Several types of intel, AMD 63xx
> but not AMD 62xx processors.
>
> 5./ Talking with my awesome colleague Sean, he remembered some discussions
> about applications segfaulting in AMD processors when compiled in an Intel
> processor with AVX2 extension. Actually, I compiled ceph 10.2.2 in an intel
> processor with AVX2 but ceph 9.2.0 was compiled several months ago on an
> intel processor without AVX2. The reason for the change is simply because we
> upgraded our infrastructure.
>
> 6./ Then, we compared the cpuflags between AMD 63xx and AMD62xx. if you look
> carefully, 63xx has 'fma f16c tbm bmi1' and 62xx has 'svm'. According to my
> colleague, fma and f16c are both AMD extensions which make AMD more
> compatible with the AVX extension by Intel.
>
> 63xx
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb lm
> rep_good extd_apicid unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1
> sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy
> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm bmi1
>
> 62xx
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb lm
> rep_good extd_apicid unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2
> x2apic popcnt aes xsave avx hypervisor lahf_lm cmp_legacy svm cr8_legacy abm
> sse4a misalignsse 3dnowprefetch osvw xop fma4
>
>
> All of the previous arguments may explain why we can use 9.2.0 in AMD 62xx,
> and why 10.2.2 works in AMD 63xx but not in AMD 62xx.
>
> So, we are hopping that compiling 10.2.2 in an intel processor without the
> AVX extensions will solve our problem.
>
> Does this make sense?
>
> I have a different theory. ObjectCacher::flush() checks
> "bh->last_write <= cutoff" to decide if it should write buffer head.
> But ObjectCacher::bh_write_adjacencies() checks "bh->last_write <
> cutoff". (cutoff is the time clock when ObjectCacher::flush() starts
> executing). If there is only one dirty buffer head and its last_write
> is equal to cutoff, the segfault happens. For some hardware
> limitations, AMD 62xx CPU may unable to provide high precision time
> clock. This explains the segfault only happens in AMD 62xx. The code
> that causes the segfault was introduced in jewel release. So ceph-fuse
> 9.2.0 does not have this problem.
>
>
> Regards
> Yan, Zheng
>
>
>
>
> The compilation takes a while but I will update the issue once I have
> finished this last experiment (in the next few days)
>
> Cheers
> Goncalo
>
>
>
> On 07/12/2016 09:45 PM, Goncalo Borges wrote:
>
> Hi All...
>
> Thank you for continuing to follow this already very long thread.
>
> Pat and Greg are correct in their assumption regarding the 10gb virtual
> memory footprint I see for ceph-fuse process in our cluster with 12 core (24
> because of hyperthreading) machines and 96 gb of RAM. The source is glibc >
> 1.10. I can reduce / tune virtual memory threads usage by setting
> MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting
> the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)
>
> The bad news is that, while reading the arena malloc glibc explanation, it
> became obvious that the virtual memory footprint scales with tje numer of
> cores. Therefore the 10gb virtual memory i was seeing in the resources with
> 12 cores (24 because of hyperthreading) could not / would not be the same in
> the VMs where I get the segfault since they have only 4 cores.
>
> So, at this point, I know that:
> a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4
> cores.
> b./ The segfault is not appearing in a set of VMs (in principle identical to
> the 16 GB ones) but with 16 cores and 64 GB of RAM.
> c./ the segfault is not appearing in a physicall cluster with machines with
> 96 GB of RAM and 12 cores (24 because of hyperthreading)
> and I am not so sure anymore that this is memory related.
>
> For further debugging, I've updated
>http://tracker.ceph.com/issues/16610
> with a summary of my finding plus some log files:
>   - The gdb.txt I get after running
>   $ gdb /path/to/ceph-fuse core.
>   (gdb) set pag off
>   (gdb) set log on
>   (gdb) thread apply all bt
>   (gdb) thread apply all bt full
>   as advised by Brad
> - The debug.out (gzipped) I get after running ceph-fuse in debug mode with
> 'debug client 20' and 'debug objectcacher = 20'
>
> Cheers
> Goncalo
> 
> From: Gregory Farnum [gfar...@redhat.c

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-18 Thread Goncalo Borges

apicid unfair_spinlock pni pclmulqdq ssse3 fma cx16 sse4_1
sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm bmi1

62xx
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb lm
rep_good extd_apicid unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2
x2apic popcnt aes xsave avx hypervisor lahf_lm cmp_legacy svm cr8_legacy abm
sse4a misalignsse 3dnowprefetch osvw xop fma4


All of the previous arguments may explain why we can use 9.2.0 in AMD 62xx,
and why 10.2.2 works in AMD 63xx but not in AMD 62xx.

So, we are hopping that compiling 10.2.2 in an intel processor without the
AVX extensions will solve our problem.

Does this make sense?

I have a different theory. ObjectCacher::flush() checks
"bh->last_write <= cutoff" to decide if it should write buffer head.
But ObjectCacher::bh_write_adjacencies() checks "bh->last_write <
cutoff". (cutoff is the time clock when ObjectCacher::flush() starts
executing). If there is only one dirty buffer head and its last_write
is equal to cutoff, the segfault happens. For some hardware
limitations, AMD 62xx CPU may unable to provide high precision time
clock. This explains the segfault only happens in AMD 62xx. The code
that causes the segfault was introduced in jewel release. So ceph-fuse
9.2.0 does not have this problem.


Regards
Yan, Zheng





The compilation takes a while but I will update the issue once I have
finished this last experiment (in the next few days)

Cheers
Goncalo



On 07/12/2016 09:45 PM, Goncalo Borges wrote:

Hi All...

Thank you for continuing to follow this already very long thread.

Pat and Greg are correct in their assumption regarding the 10gb virtual
memory footprint I see for ceph-fuse process in our cluster with 12 core (24
because of hyperthreading) machines and 96 gb of RAM. The source is glibc >
1.10. I can reduce / tune virtual memory threads usage by setting
MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting
the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)

The bad news is that, while reading the arena malloc glibc explanation, it
became obvious that the virtual memory footprint scales with tje numer of
cores. Therefore the 10gb virtual memory i was seeing in the resources with
12 cores (24 because of hyperthreading) could not / would not be the same in
the VMs where I get the segfault since they have only 4 cores.

So, at this point, I know that:
a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4
cores.
b./ The segfault is not appearing in a set of VMs (in principle identical to
the 16 GB ones) but with 16 cores and 64 GB of RAM.
c./ the segfault is not appearing in a physicall cluster with machines with
96 GB of RAM and 12 cores (24 because of hyperthreading)
and I am not so sure anymore that this is memory related.

For further debugging, I've updated
http://tracker.ceph.com/issues/16610
with a summary of my finding plus some log files:
   - The gdb.txt I get after running
   $ gdb /path/to/ceph-fuse core.
   (gdb) set pag off
   (gdb) set log on
   (gdb) thread apply all bt
   (gdb) thread apply all bt full
   as advised by Brad
- The debug.out (gzipped) I get after running ceph-fuse in debug mode with
'debug client 20' and 'debug objectcacher = 20'

Cheers
Goncalo

From: Gregory Farnum [gfar...@redhat.com]
Sent: 12 July 2016 03:07
To: Goncalo Borges
Cc: John Spray; ceph-users
Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Oh, is this one of your custom-built packages? Are they using
tcmalloc? That difference between VSZ and RSS looks like a glibc
malloc problem.
-Greg

On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
<goncalo.bor...@sydney.edu.au>  wrote:

Hi John...

Thank you for replying.

Here is the result of the tests you asked but I do not see nothing abnormal.
Actually, your suggestions made me see that:

1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
consumption, probably, less enought so that it doesn't brake ceph-fuse in
our machines with less memory.

2) I see a tremendous number of  ceph-fuse threads launched (around 160).

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
157

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3235 9935240 339780

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges

eng





The compilation takes a while but I will update the issue once I have
finished this last experiment (in the next few days)

Cheers
Goncalo



On 07/12/2016 09:45 PM, Goncalo Borges wrote:

Hi All...

Thank you for continuing to follow this already very long thread.

Pat and Greg are correct in their assumption regarding the 10gb virtual
memory footprint I see for ceph-fuse process in our cluster with 12 core (24
because of hyperthreading) machines and 96 gb of RAM. The source is glibc >
1.10. I can reduce / tune virtual memory threads usage by setting
MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting
the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)

The bad news is that, while reading the arena malloc glibc explanation, it
became obvious that the virtual memory footprint scales with tje numer of
cores. Therefore the 10gb virtual memory i was seeing in the resources with
12 cores (24 because of hyperthreading) could not / would not be the same in
the VMs where I get the segfault since they have only 4 cores.

So, at this point, I know that:
a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4
cores.
b./ The segfault is not appearing in a set of VMs (in principle identical to
the 16 GB ones) but with 16 cores and 64 GB of RAM.
c./ the segfault is not appearing in a physicall cluster with machines with
96 GB of RAM and 12 cores (24 because of hyperthreading)
and I am not so sure anymore that this is memory related.

For further debugging, I've updated
http://tracker.ceph.com/issues/16610
with a summary of my finding plus some log files:
   - The gdb.txt I get after running
   $ gdb /path/to/ceph-fuse core.
   (gdb) set pag off
   (gdb) set log on
   (gdb) thread apply all bt
   (gdb) thread apply all bt full
   as advised by Brad
- The debug.out (gzipped) I get after running ceph-fuse in debug mode with
'debug client 20' and 'debug objectcacher = 20'

Cheers
Goncalo

From: Gregory Farnum [gfar...@redhat.com]
Sent: 12 July 2016 03:07
To: Goncalo Borges
Cc: John Spray; ceph-users
Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Oh, is this one of your custom-built packages? Are they using
tcmalloc? That difference between VSZ and RSS looks like a glibc
malloc problem.
-Greg

On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
<goncalo.bor...@sydney.edu.au> wrote:

Hi John...

Thank you for replying.

Here is the result of the tests you asked but I do not see nothing abnormal.
Actually, your suggestions made me see that:

1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
consumption, probably, less enought so that it doesn't brake ceph-fuse in
our machines with less memory.

2) I see a tremendous number of  ceph-fuse threads launched (around 160).

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
157

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0


I do not see a way to actually limit the number of ceph-fuse threads
launched  or to limit the max vm size each thread should take.

Do you know how to limit those options.

Cheers

Goncalo




1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
leaking

I have launched ceph-fuse with valgrind in the cluster where there is
sufficient memory available, and therefore, there is no object cacher
segfault.

 $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
X.X.X.8:6789 -r /cephfs /coepp/cephfs

This is the output which I get once I unmount the file system after user
application execution

# cat valgrind-ceph-fuse-10.2.2.txt
==12123== Memcheck, a memory error detector
==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==12123== Command: ceph-fuse --id mount_user -k
/etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r /cephfs
/coepp/cephfs
==12123== Parent PID: 11992
==12123==
==12123==
==12123== HEAP SUMMARY:
==12123== in use at exit: 29,129 bytes in 397 blocks
==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030 bytes
a

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard

On Fri, Jul 15, 2016 at 11:19:12AM +0800, Yan, Zheng wrote:
> On Fri, Jul 15, 2016 at 9:35 AM, Goncalo Borges
> <goncalo.bor...@sydney.edu.au> wrote:
> > So, we are hopping that compiling 10.2.2 in an intel processor without the
> > AVX extensions will solve our problem.
> >
> > Does this make sense?
> 
> I have a different theory. ObjectCacher::flush() checks
> "bh->last_write <= cutoff" to decide if it should write buffer head.
> But ObjectCacher::bh_write_adjacencies() checks "bh->last_write <
> cutoff". (cutoff is the time clock when ObjectCacher::flush() starts
> executing). If there is only one dirty buffer head and its last_write
> is equal to cutoff, the segfault happens. For some hardware
> limitations, AMD 62xx CPU may unable to provide high precision time
> clock. This explains the segfault only happens in AMD 62xx. The code
> that causes the segfault was introduced in jewel release. So ceph-fuse
> 9.2.0 does not have this problem.

Hmmm... this also make a lot of sense.

I guess trying with your patch on all the CPUs mentioned should prove it one
way or the other.

-- 
Cheers,
Brad

> 
> 
> Regards
> Yan, Zheng
> 
> 
> 
> 
> >
> > The compilation takes a while but I will update the issue once I have
> > finished this last experiment (in the next few days)
> >
> > Cheers
> > Goncalo
> >
> >
> >
> > On 07/12/2016 09:45 PM, Goncalo Borges wrote:
> >
> > Hi All...
> >
> > Thank you for continuing to follow this already very long thread.
> >
> > Pat and Greg are correct in their assumption regarding the 10gb virtual
> > memory footprint I see for ceph-fuse process in our cluster with 12 core (24
> > because of hyperthreading) machines and 96 gb of RAM. The source is glibc >
> > 1.10. I can reduce / tune virtual memory threads usage by setting
> > MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting
> > the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)
> >
> > The bad news is that, while reading the arena malloc glibc explanation, it
> > became obvious that the virtual memory footprint scales with tje numer of
> > cores. Therefore the 10gb virtual memory i was seeing in the resources with
> > 12 cores (24 because of hyperthreading) could not / would not be the same in
> > the VMs where I get the segfault since they have only 4 cores.
> >
> > So, at this point, I know that:
> > a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4
> > cores.
> > b./ The segfault is not appearing in a set of VMs (in principle identical to
> > the 16 GB ones) but with 16 cores and 64 GB of RAM.
> > c./ the segfault is not appearing in a physicall cluster with machines with
> > 96 GB of RAM and 12 cores (24 because of hyperthreading)
> > and I am not so sure anymore that this is memory related.
> >
> > For further debugging, I've updated
> >http://tracker.ceph.com/issues/16610
> > with a summary of my finding plus some log files:
> >   - The gdb.txt I get after running
> >   $ gdb /path/to/ceph-fuse core.
> >   (gdb) set pag off
> >   (gdb) set log on
> >   (gdb) thread apply all bt
> >   (gdb) thread apply all bt full
> >   as advised by Brad
> > - The debug.out (gzipped) I get after running ceph-fuse in debug mode with
> > 'debug client 20' and 'debug objectcacher = 20'
> >
> > Cheers
> > Goncalo
> > 
> > From: Gregory Farnum [gfar...@redhat.com]
> > Sent: 12 July 2016 03:07
> > To: Goncalo Borges
> > Cc: John Spray; ceph-users
> > Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)
> >
> > Oh, is this one of your custom-built packages? Are they using
> > tcmalloc? That difference between VSZ and RSS looks like a glibc
> > malloc problem.
> > -Greg
> >
> > On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
> > <goncalo.bor...@sydney.edu.au> wrote:
> >
> > Hi John...
> >
> > Thank you for replying.
> >
> > Here is the result of the tests you asked but I do not see nothing abnormal.
> > Actually, your suggestions made me see that:
> >
> > 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> > consumption, probably, less enought so that it doesn't brake ceph-fuse in
> > our machines with less memory.
> >
> > 2) I see a tremendous number of  ceph-fuse threads launched (around 160).
> >
> > # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
>

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Yan, Zheng

issue once I have
> finished this last experiment (in the next few days)
>
> Cheers
> Goncalo
>
>
>
> On 07/12/2016 09:45 PM, Goncalo Borges wrote:
>
> Hi All...
>
> Thank you for continuing to follow this already very long thread.
>
> Pat and Greg are correct in their assumption regarding the 10gb virtual
> memory footprint I see for ceph-fuse process in our cluster with 12 core (24
> because of hyperthreading) machines and 96 gb of RAM. The source is glibc >
> 1.10. I can reduce / tune virtual memory threads usage by setting
> MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting
> the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)
>
> The bad news is that, while reading the arena malloc glibc explanation, it
> became obvious that the virtual memory footprint scales with tje numer of
> cores. Therefore the 10gb virtual memory i was seeing in the resources with
> 12 cores (24 because of hyperthreading) could not / would not be the same in
> the VMs where I get the segfault since they have only 4 cores.
>
> So, at this point, I know that:
> a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4
> cores.
> b./ The segfault is not appearing in a set of VMs (in principle identical to
> the 16 GB ones) but with 16 cores and 64 GB of RAM.
> c./ the segfault is not appearing in a physicall cluster with machines with
> 96 GB of RAM and 12 cores (24 because of hyperthreading)
> and I am not so sure anymore that this is memory related.
>
> For further debugging, I've updated
>http://tracker.ceph.com/issues/16610
> with a summary of my finding plus some log files:
>   - The gdb.txt I get after running
>   $ gdb /path/to/ceph-fuse core.
>   (gdb) set pag off
>   (gdb) set log on
>   (gdb) thread apply all bt
>   (gdb) thread apply all bt full
>   as advised by Brad
> - The debug.out (gzipped) I get after running ceph-fuse in debug mode with
> 'debug client 20' and 'debug objectcacher = 20'
>
> Cheers
> Goncalo
> 
> From: Gregory Farnum [gfar...@redhat.com]
> Sent: 12 July 2016 03:07
> To: Goncalo Borges
> Cc: John Spray; ceph-users
> Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)
>
> Oh, is this one of your custom-built packages? Are they using
> tcmalloc? That difference between VSZ and RSS looks like a glibc
> malloc problem.
> -Greg
>
> On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
> <goncalo.bor...@sydney.edu.au> wrote:
>
> Hi John...
>
> Thank you for replying.
>
> Here is the result of the tests you asked but I do not see nothing abnormal.
> Actually, your suggestions made me see that:
>
> 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> consumption, probably, less enought so that it doesn't brake ceph-fuse in
> our machines with less memory.
>
> 2) I see a tremendous number of  ceph-fuse threads launched (around 160).
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
> 157
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
> COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
> ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
> ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0
>
>
> I do not see a way to actually limit the number of ceph-fuse threads
> launched  or to limit the max vm size each thread should take.
>
> Do you know how to limit those options.
>
> Cheers
>
> Goncalo
>
>
>
>
> 1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
> leaking
>
> I have launched ceph-fuse with valgrind in the cluster where there is
> sufficient memory available, and therefore, there is no object cacher
> segfault.
>
> $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
> ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
> X.X.X.8:6789 -r /cephfs /coepp/cephfs
>
> This is the output which I get once I unmount the file system after user
> application execution
>
> # cat valgrind-ceph-fuse-10.2.2.txt
> ==12123== Memcheck, a memory

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard

e for ceph-fuse process in our cluster with 12 core 
> > (24 because of hyperthreading) machines and 96 gb of RAM. The source is 
> > glibc > 1.10. I can reduce / tune virtual memory threads usage by setting 
> > MALLOC_ARENA_MAX = 4 (the default is 8 on 64 bits machines) before mounting 
> > the filesystem with ceph-fuse. So, there is no memory leak on ceph-fuse :-)
> > 
> > The bad news is that, while reading the arena malloc glibc explanation, it 
> > became obvious that the virtual memory footprint scales with tje numer of 
> > cores. Therefore the 10gb virtual memory i was seeing in the resources with 
> > 12 cores (24 because of hyperthreading) could not / would not be the same 
> > in the VMs where I get the segfault since they have only 4 cores.
> > 
> > So, at this point, I know that:
> > a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 
> > 4 cores.
> > b./ The segfault is not appearing in a set of VMs (in principle identical 
> > to the 16 GB ones) but with 16 cores and 64 GB of RAM.
> > c./ the segfault is not appearing in a physicall cluster with machines with 
> > 96 GB of RAM and 12 cores (24 because of hyperthreading)
> > and I am not so sure anymore that this is memory related.
> > 
> > For further debugging, I've updated
> > http://tracker.ceph.com/issues/16610
> > with a summary of my finding plus some log files:
> >- The gdb.txt I get after running
> >$ gdb /path/to/ceph-fuse core.
> >(gdb) set pag off
> >(gdb) set log on
> >    (gdb) thread apply all bt
> >(gdb) thread apply all bt full
> >as advised by Brad
> > - The debug.out (gzipped) I get after running ceph-fuse in debug mode with 
> > 'debug client 20' and 'debug objectcacher = 20'
> > 
> > Cheers
> > Goncalo
> > 
> > From: Gregory Farnum [gfar...@redhat.com]
> > Sent: 12 July 2016 03:07
> > To: Goncalo Borges
> > Cc: John Spray; ceph-users
> > Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)
> > 
> > Oh, is this one of your custom-built packages? Are they using
> > tcmalloc? That difference between VSZ and RSS looks like a glibc
> > malloc problem.
> > -Greg
> > 
> > On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
> > <goncalo.bor...@sydney.edu.au> wrote:
> > > Hi John...
> > > 
> > > Thank you for replying.
> > > 
> > > Here is the result of the tests you asked but I do not see nothing 
> > > abnormal.
> > > Actually, your suggestions made me see that:
> > > 
> > > 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> > > consumption, probably, less enought so that it doesn't brake ceph-fuse in
> > > our machines with less memory.
> > > 
> > > 2) I see a tremendous number of  ceph-fuse threads launched (around 160).
> > > 
> > > # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
> > > 157
> > > 
> > > # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
> > > COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
> > > ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
> > > ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
> > > ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0
> > > 
> > > 
> > > I do not see a way to actually limit the number of ceph-fuse threads
> > > launched  or to limit the max vm size each thread should take.
> > > 
> > > Do you know how to limit those options.
> > > 
> > > Cheers
> > > 
> > > Goncalo
> > > 
> > > 
> > > 
> > > 
> > > 1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
> > > leaking
> > > 
> > > I have launched ceph-fuse with valgrind in the cluster where there is
> > > sufficient memory available, and therefore, there is no object cacher
> > > segfault.
> > >

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges

, I know that:
a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4 
cores.
b./ The segfault is not appearing in a set of VMs (in principle identical to 
the 16 GB ones) but with 16 cores and 64 GB of RAM.
c./ the segfault is not appearing in a physicall cluster with machines with 96 
GB of RAM and 12 cores (24 because of hyperthreading)
and I am not so sure anymore that this is memory related.

For further debugging, I've updated
http://tracker.ceph.com/issues/16610
with a summary of my finding plus some log files:
   - The gdb.txt I get after running
   $ gdb /path/to/ceph-fuse core.
   (gdb) set pag off
   (gdb) set log on
   (gdb) thread apply all bt
   (gdb) thread apply all bt full
   as advised by Brad
- The debug.out (gzipped) I get after running ceph-fuse in debug mode with 
'debug client 20' and 'debug objectcacher = 20'

Cheers
Goncalo

From: Gregory Farnum [gfar...@redhat.com]
Sent: 12 July 2016 03:07
To: Goncalo Borges
Cc: John Spray; ceph-users
Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Oh, is this one of your custom-built packages? Are they using
tcmalloc? That difference between VSZ and RSS looks like a glibc
malloc problem.
-Greg

On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
<goncalo.bor...@sydney.edu.au> wrote:

Hi John...

Thank you for replying.

Here is the result of the tests you asked but I do not see nothing abnormal.
Actually, your suggestions made me see that:

1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
consumption, probably, less enought so that it doesn't brake ceph-fuse in
our machines with less memory.

2) I see a tremendous number of  ceph-fuse threads launched (around 160).

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
157

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0


I do not see a way to actually limit the number of ceph-fuse threads
launched  or to limit the max vm size each thread should take.

Do you know how to limit those options.

Cheers

Goncalo




1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
leaking

I have launched ceph-fuse with valgrind in the cluster where there is
sufficient memory available, and therefore, there is no object cacher
segfault.

 $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
X.X.X.8:6789 -r /cephfs /coepp/cephfs

This is the output which I get once I unmount the file system after user
application execution

# cat valgrind-ceph-fuse-10.2.2.txt
==12123== Memcheck, a memory error detector
==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==12123== Command: ceph-fuse --id mount_user -k
/etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r /cephfs
/coepp/cephfs
==12123== Parent PID: 11992
==12123==
==12123==
==12123== HEAP SUMMARY:
==12123== in use at exit: 29,129 bytes in 397 blocks
==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030 bytes
allocated
==12123==
==12123== LEAK SUMMARY:
==12123==definitely lost: 16 bytes in 1 blocks
==12123==indirectly lost: 0 bytes in 0 blocks
==12123==  possibly lost: 11,705 bytes in 273 blocks
==12123==still reachable: 17,408 bytes in 123 blocks
==12123== suppressed: 0 bytes in 0 blocks
==12123== Rerun with --leak-check=full to see details of leaked memory
==12123==
==12123== For counts of detected and suppressed errors, rerun with: -v
==12123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
==12126==
==12126== HEAP SUMMARY:
==12126== in use at exit: 9,641 bytes in 73 blocks
==12126==   total heap usage: 31,363,579 allocs, 31,363,506 frees,
41,389,143,617 bytes allocated
==12126==
==12126== LEAK SUMMARY:
==12126==definitely lost: 28 bytes in 1 blocks
==12126==indirectly lost: 0 bytes in 0 blocks
==12126==  possibly lost: 0 bytes in 0 blocks
==12126==still reachable: 9,613 bytes in 72 blocks
==12126== suppressed: 0 bytes in 0 blocks
==12126== Rerun with --leak-check=full to see details of leaked memory
==12126==
==12126=

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-12 Thread Goncalo Borges

Hi All...

Thank you for continuing to follow this already very long thread.

Pat and Greg are correct in their assumption regarding the 10gb virtual memory 
footprint I see for ceph-fuse process in our cluster with 12 core (24 because 
of hyperthreading) machines and 96 gb of RAM. The source is glibc > 1.10. I can 
reduce / tune virtual memory threads usage by setting MALLOC_ARENA_MAX = 4 (the 
default is 8 on 64 bits machines) before mounting the filesystem with 
ceph-fuse. So, there is no memory leak on ceph-fuse :-)

The bad news is that, while reading the arena malloc glibc explanation, it 
became obvious that the virtual memory footprint scales with tje numer of 
cores. Therefore the 10gb virtual memory i was seeing in the resources with 12 
cores (24 because of hyperthreading) could not / would not be the same in the 
VMs where I get the segfault since they have only 4 cores. 

So, at this point, I know that:
a./ The segfault is always appearing in a set of VMs with 16 GB of RAM and 4 
cores. 
b./ The segfault is not appearing in a set of VMs (in principle identical to 
the 16 GB ones) but with 16 cores and 64 GB of RAM.
c./ the segfault is not appearing in a physicall cluster with machines with 96 
GB of RAM and 12 cores (24 because of hyperthreading)
and I am not so sure anymore that this is memory related.

For further debugging, I've updated 
   http://tracker.ceph.com/issues/16610
with a summary of my finding plus some log files:
  - The gdb.txt I get after running 
  $ gdb /path/to/ceph-fuse core.
  (gdb) set pag off
  (gdb) set log on
  (gdb) thread apply all bt
  (gdb) thread apply all bt full
  as advised by Brad
- The debug.out (gzipped) I get after running ceph-fuse in debug mode with 
'debug client 20' and 'debug objectcacher = 20'

Cheers
Goncalo

From: Gregory Farnum [gfar...@redhat.com]
Sent: 12 July 2016 03:07
To: Goncalo Borges
Cc: John Spray; ceph-users
Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Oh, is this one of your custom-built packages? Are they using
tcmalloc? That difference between VSZ and RSS looks like a glibc
malloc problem.
-Greg

On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
<goncalo.bor...@sydney.edu.au> wrote:
> Hi John...
>
> Thank you for replying.
>
> Here is the result of the tests you asked but I do not see nothing abnormal.
> Actually, your suggestions made me see that:
>
> 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> consumption, probably, less enought so that it doesn't brake ceph-fuse in
> our machines with less memory.
>
> 2) I see a tremendous number of  ceph-fuse threads launched (around 160).
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
> 157
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
> COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
> ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
> ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0
>
>
> I do not see a way to actually limit the number of ceph-fuse threads
> launched  or to limit the max vm size each thread should take.
>
> Do you know how to limit those options.
>
> Cheers
>
> Goncalo
>
>
>
>
> 1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
> leaking
>
> I have launched ceph-fuse with valgrind in the cluster where there is
> sufficient memory available, and therefore, there is no object cacher
> segfault.
>
> $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
> ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
> X.X.X.8:6789 -r /cephfs /coepp/cephfs
>
> This is the output which I get once I unmount the file system after user
> application execution
>
> # cat valgrind-ceph-fuse-10.2.2.txt
> ==12123== Memcheck, a memory error detector
> ==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> ==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> ==12123== Command: ceph-fuse --id mount_user -k
> /etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r /cephfs
> /coepp/cephfs
> ==12123== Parent PID: 11992
> ==12123==
> ==12123==
> ==12123== HEAP SUMMARY:
> ==1

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Yan, Zheng

On Tue, Jul 12, 2016 at 1:07 AM, Gregory Farnum  wrote:
> Oh, is this one of your custom-built packages? Are they using
> tcmalloc? That difference between VSZ and RSS looks like a glibc
> malloc problem.
> -Greg
>

ceph-fuse at http://download.ceph.com/rpm-jewel/el7/x86_64/ is not
linked to libtcmalloc either. open issue
http://tracker.ceph.com/issues/16655

Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Gregory Farnum

Oh, is this one of your custom-built packages? Are they using
tcmalloc? That difference between VSZ and RSS looks like a glibc
malloc problem.
-Greg

On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges
 wrote:
> Hi John...
>
> Thank you for replying.
>
> Here is the result of the tests you asked but I do not see nothing abnormal.
> Actually, your suggestions made me see that:
>
> 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> consumption, probably, less enought so that it doesn't brake ceph-fuse in
> our machines with less memory.
>
> 2) I see a tremendous number of  ceph-fuse threads launched (around 160).
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
> 157
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
> COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
> ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
> ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0
>
>
> I do not see a way to actually limit the number of ceph-fuse threads
> launched  or to limit the max vm size each thread should take.
>
> Do you know how to limit those options.
>
> Cheers
>
> Goncalo
>
>
>
>
> 1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
> leaking
>
> I have launched ceph-fuse with valgrind in the cluster where there is
> sufficient memory available, and therefore, there is no object cacher
> segfault.
>
> $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
> ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
> X.X.X.8:6789 -r /cephfs /coepp/cephfs
>
> This is the output which I get once I unmount the file system after user
> application execution
>
> # cat valgrind-ceph-fuse-10.2.2.txt
> ==12123== Memcheck, a memory error detector
> ==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> ==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> ==12123== Command: ceph-fuse --id mount_user -k
> /etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r /cephfs
> /coepp/cephfs
> ==12123== Parent PID: 11992
> ==12123==
> ==12123==
> ==12123== HEAP SUMMARY:
> ==12123== in use at exit: 29,129 bytes in 397 blocks
> ==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030 bytes
> allocated
> ==12123==
> ==12123== LEAK SUMMARY:
> ==12123==definitely lost: 16 bytes in 1 blocks
> ==12123==indirectly lost: 0 bytes in 0 blocks
> ==12123==  possibly lost: 11,705 bytes in 273 blocks
> ==12123==still reachable: 17,408 bytes in 123 blocks
> ==12123== suppressed: 0 bytes in 0 blocks
> ==12123== Rerun with --leak-check=full to see details of leaked memory
> ==12123==
> ==12123== For counts of detected and suppressed errors, rerun with: -v
> ==12123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
> ==12126==
> ==12126== HEAP SUMMARY:
> ==12126== in use at exit: 9,641 bytes in 73 blocks
> ==12126==   total heap usage: 31,363,579 allocs, 31,363,506 frees,
> 41,389,143,617 bytes allocated
> ==12126==
> ==12126== LEAK SUMMARY:
> ==12126==definitely lost: 28 bytes in 1 blocks
> ==12126==indirectly lost: 0 bytes in 0 blocks
> ==12126==  possibly lost: 0 bytes in 0 blocks
> ==12126==still reachable: 9,613 bytes in 72 blocks
> ==12126== suppressed: 0 bytes in 0 blocks
> ==12126== Rerun with --leak-check=full to see details of leaked memory
> ==12126==
> ==12126== For counts of detected and suppressed errors, rerun with: -v
> ==12126== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 9)
>
> --- * ---
>
> 2.>  Inspect inode count (ceph daemon  status) to see if it's
> obeying its limit
>
> This is the output I get once ceph-fuse is mounted but no user application
> is running
>
> # ceph daemon /var/run/ceph/ceph-client.mount_user.asok status
> {
> "metadata": {
> "ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",
> "ceph_version": "ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)",
> "entity_id": "mount_user",
> "hostname": "",
> "mount_point": "\/coepp\/cephfs",
> "root": "\/cephfs"
> },
> "dentry_count": 0,
> "dentry_pinned_count": 0,
> "inode_count": 2,
> "mds_epoch": 817,
> "osd_epoch": 1005,
> "osd_epoch_barrier": 0
> }
>
>
> This is

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Patrick Donnelly

Hi Goncalo,

On Fri, Jul 8, 2016 at 3:01 AM, Goncalo Borges
 wrote:
> 5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB of
> virtual memory when there is no applications using the filesystem.
>
>  7152 root  20   0 1108m  12m 5496 S  0.0  0.0   0:00.04 ceph-fuse
>
> When I only have one instance of the user application running, ceph-fuse (in
> 10.2.2) slowly rises with time up to 10 GB of memory usage.
>
> if I submit a large number of user applications simultaneously, ceph-fuse
> goes very fast to ~10GB.
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 18563 root  20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00 ceph-fuse
>  4343 root  20   0 3131m 237m  12m S  0.0  0.5  28:24.56 dsm_om_connsvcd
>  5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46 python
> 31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88 python
> 20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29 python
> 20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20 python
>  4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70 python
>  1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72 python
> 20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46 python
> 20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37 python
> 28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52 python
> 20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09 python
> 20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42 python
> 20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32 python

I've seen this type of thing before. It could be glibc's malloc arenas
for threads. See:

https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

I would guess there are 20 cores on this machine*?

* 20 = 10GB/(8*64MB)

If the cause here is glibc arenas, I don't think we need to do
anything special. The virtual memory is not actually being used due to
Linux overcommit.

> 6./ On the machines where the user had the segfault, we have 16 GB of RAM
> and 1GB of SWAP
>
> Mem:  16334244k total,  3590100k used, 12744144k free,   221364k buffers
> Swap:  1572860k total,10512k used,  1562348k free,  2937276k cached

But do we know that ceph-fuse is using 10G VM on those machines (the
core count may be different)?

> 7./ I think what is happening is that once the user submits his sets of
> jobs, the memory usage goes to the very limit on this type machine, and the
> raise is actually to fast that ceph-fuse segfaults before OOM Killer can
> kill it.

It's possible but we have no evidence yet that ceph-fuse is using up
all the memory on those machines yet, right?

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread John Spray

On Mon, Jul 11, 2016 at 8:04 AM, Goncalo Borges
 wrote:
> Hi John...
>
> Thank you for replying.
>
> Here is the result of the tests you asked but I do not see nothing abnormal.

Thanks for running through that.  Yes, nothing in the output struck me
as unreasonable either :-/

> Actually, your suggestions made me see that:
>
> 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory
> consumption, probably, less enought so that it doesn't brake ceph-fuse in
> our machines with less memory.
>
> 2) I see a tremendous number of  ceph-fuse threads launched (around 160).

Unless you're using the async messenger, Ceph creates threads for each
OSD connection, so it's normal to have a significant number of threads
(e.g. if you had about 80 OSDs that would explain your thread count).

John

> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
> 157
>
> # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head -n 10
> COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
> ceph-fuse --id mount_user - 1  3230  3230 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3231 9935240 339780  0.6 0.1
> ceph-fuse --id mount_user - 1  3230  3232 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3233 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3234 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3235 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3236 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3237 9935240 339780  0.6 0.0
> ceph-fuse --id mount_user - 1  3230  3238 9935240 339780  0.6 0.0
>
>
> I do not see a way to actually limit the number of ceph-fuse threads
> launched  or to limit the max vm size each thread should take.
>
> Do you know how to limit those options.
>
> Cheers
>
> Goncalo
>
>
>
>
> 1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's
> leaking
>
> I have launched ceph-fuse with valgrind in the cluster where there is
> sufficient memory available, and therefore, there is no object cacher
> segfault.
>
> $ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt --tool=memcheck
> ceph-fuse --id mount_user -k /etc/ceph/ceph.client.mount_user.keyring -m
> X.X.X.8:6789 -r /cephfs /coepp/cephfs
>
> This is the output which I get once I unmount the file system after user
> application execution
>
> # cat valgrind-ceph-fuse-10.2.2.txt
> ==12123== Memcheck, a memory error detector
> ==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> ==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> ==12123== Command: ceph-fuse --id mount_user -k
> /etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r /cephfs
> /coepp/cephfs
> ==12123== Parent PID: 11992
> ==12123==
> ==12123==
> ==12123== HEAP SUMMARY:
> ==12123== in use at exit: 29,129 bytes in 397 blocks
> ==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030 bytes
> allocated
> ==12123==
> ==12123== LEAK SUMMARY:
> ==12123==definitely lost: 16 bytes in 1 blocks
> ==12123==indirectly lost: 0 bytes in 0 blocks
> ==12123==  possibly lost: 11,705 bytes in 273 blocks
> ==12123==still reachable: 17,408 bytes in 123 blocks
> ==12123== suppressed: 0 bytes in 0 blocks
> ==12123== Rerun with --leak-check=full to see details of leaked memory
> ==12123==
> ==12123== For counts of detected and suppressed errors, rerun with: -v
> ==12123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
> ==12126==
> ==12126== HEAP SUMMARY:
> ==12126== in use at exit: 9,641 bytes in 73 blocks
> ==12126==   total heap usage: 31,363,579 allocs, 31,363,506 frees,
> 41,389,143,617 bytes allocated
> ==12126==
> ==12126== LEAK SUMMARY:
> ==12126==definitely lost: 28 bytes in 1 blocks
> ==12126==indirectly lost: 0 bytes in 0 blocks
> ==12126==  possibly lost: 0 bytes in 0 blocks
> ==12126==still reachable: 9,613 bytes in 72 blocks
> ==12126== suppressed: 0 bytes in 0 blocks
> ==12126== Rerun with --leak-check=full to see details of leaked memory
> ==12126==
> ==12126== For counts of detected and suppressed errors, rerun with: -v
> ==12126== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 9)
>
> --- * ---
>
> 2.>  Inspect inode count (ceph daemon  status) to see if it's
> obeying its limit
>
> This is the output I get once ceph-fuse is mounted but no user application
> is running
>
> # ceph daemon /var/run/ceph/ceph-client.mount_user.asok status
> {
> "metadata": {
> "ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",
> "ceph_version": "ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)",
> "entity_id": "mount_user",
> "hostname": "",
> "mount_point": "\/coepp\/cephfs",
> "root": "\/cephfs"
> },
> "dentry_count": 0,
>

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges




On 07/11/2016 05:04 PM, Goncalo Borges wrote:


Hi John...

Thank you for replying.

Here is the result of the tests you asked but I do not see nothing 
abnormal. Actually, your suggestions made me see that:


1) ceph-fuse 9.2.0 is presenting the same behaviour but with less 
memory consumption, probably, less enought so that it doesn't brake 
ceph-fuse in our machines with less memory.


2) I see a tremendous number of  ceph-fuse threads launched (around 160).

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
157

# ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu |
head -n 10
COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
ceph-fuse --id mount_user - 1  3230  3230 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3231 9935240 339780 0.6 0.1
ceph-fuse --id mount_user - 1  3230  3232 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3233 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3234 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3235 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3236 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3237 9935240 339780 0.6 0.0
ceph-fuse --id mount_user - 1  3230  3238 9935240 339780 0.6 0.0


I do not see a way to actually limit the number of ceph-fuse threads 
launched  or to limit the max vm size each thread should take.




By the way, mounting ceph requesting to disable fuse multithreading 
doesn't seem to work for me


   ceph-fuse --id mount_user -k 
/etc/ceph/ceph.client.mount_user.keyring -m XXX:6789 -s -r /cephfs 
/coepp/cephfs &


Once the user application fills the machine, I just see the number of 
threads increasing until ~160






# ps -T -p 21426 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | grep 
ceph-fuse | wc -l

24

# ps -T -p 21426 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | grep 
ceph-fuse | wc -l

28


# ps -T -p 21426 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | grep 
ceph-fuse | wc -l

30


# ps -T -p 21426 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | grep 
ceph-fuse | wc -l

50

(...)

Cheers
G.




Do you know how to limit those options.

Cheers

Goncalo




1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's 
leaking


I have launched ceph-fuse with valgrind in the cluster where there is 
sufficient memory available, and therefore, there is no object cacher 
segfault.


$ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt 
--tool=memcheck ceph-fuse --id mount_user -k 
/etc/ceph/ceph.client.mount_user.keyring -m X.X.X.8:6789 -r /cephfs 
/coepp/cephfs


This is the output which I get once I unmount the file system after 
user application execution


# cat valgrind-ceph-fuse-10.2.2.txt
==12123== Memcheck, a memory error detector
==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward
et al.
==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for
copyright info
==12123== Command: ceph-fuse --id mount_user -k
/etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r
/cephfs /coepp/cephfs
==12123== Parent PID: 11992
==12123==
==12123==
==12123== HEAP SUMMARY:
==12123== in use at exit: 29,129 bytes in 397 blocks
==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030
bytes allocated
==12123==
==12123== LEAK SUMMARY:
==12123==definitely lost: 16 bytes in 1 blocks
==12123==indirectly lost: 0 bytes in 0 blocks
==12123==  possibly lost: 11,705 bytes in 273 blocks
==12123==still reachable: 17,408 bytes in 123 blocks
==12123== suppressed: 0 bytes in 0 blocks
==12123== Rerun with --leak-check=full to see details of leaked memory
==12123==
==12123== For counts of detected and suppressed errors, rerun with: -v
==12123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8
from 6)
==12126==
==12126== HEAP SUMMARY:
==12126== in use at exit: 9,641 bytes in 73 blocks
==12126==   total heap usage: 31,363,579 allocs, 31,363,506 frees,
41,389,143,617 bytes allocated
==12126==
==12126== LEAK SUMMARY:
==12126==definitely lost: 28 bytes in 1 blocks
==12126==indirectly lost: 0 bytes in 0 blocks
==12126==  possibly lost: 0 bytes in 0 blocks
==12126==still reachable: 9,613 bytes in 72 blocks
==12126== suppressed: 0 bytes in 0 blocks
==12126== Rerun with --leak-check=full to see details of leaked memory
==12126==
==12126== For counts of detected and suppressed errors, rerun with: -v
==12126== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17
from 9)

--- * ---

2.>  Inspect inode count (ceph daemon  status) to see if 
it's obeying its limit


This is the output I get once ceph-fuse is mounted but no user 
application is running


# ceph daemon

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges


Hi John...

Thank you for replying.

Here is the result of the tests you asked but I do not see nothing 
abnormal. Actually, your suggestions made me see that:


1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory 
consumption, probably, less enought so that it doesn't brake ceph-fuse 
in our machines with less memory.


2) I see a tremendous number of  ceph-fuse threads launched (around 160).

   # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | wc -l
   157

   # ps -T -p 3230 -o command,ppid,pid,spid,vsize,rss,%mem,%cpu | head
   -n 10
   COMMAND  PPID   PID  SPIDVSZ   RSS %MEM %CPU
   ceph-fuse --id mount_user - 1  3230  3230 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3231 9935240 339780 0.6 0.1
   ceph-fuse --id mount_user - 1  3230  3232 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3233 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3234 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3235 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3236 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3237 9935240 339780 0.6 0.0
   ceph-fuse --id mount_user - 1  3230  3238 9935240 339780 0.6 0.0


I do not see a way to actually limit the number of ceph-fuse threads 
launched  or to limit the max vm size each thread should take.


Do you know how to limit those options.

Cheers

Goncalo




1.> Try running ceph-fuse with valgrind --tool=memcheck to see if it's 
leaking


I have launched ceph-fuse with valgrind in the cluster where there is 
sufficient memory available, and therefore, there is no object cacher 
segfault.


$ valgrind --log-file=/tmp/valgrind-ceph-fuse-10.2.2.txt 
--tool=memcheck ceph-fuse --id mount_user -k 
/etc/ceph/ceph.client.mount_user.keyring -m X.X.X.8:6789 -r /cephfs 
/coepp/cephfs


This is the output which I get once I unmount the file system after user 
application execution


   # cat valgrind-ceph-fuse-10.2.2.txt
   ==12123== Memcheck, a memory error detector
   ==12123== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward
   et al.
   ==12123== Using Valgrind-3.8.1 and LibVEX; rerun with -h for
   copyright info
   ==12123== Command: ceph-fuse --id mount_user -k
   /etc/ceph/ceph.client.mount_user.keyring -m 192.231.127.8:6789 -r
   /cephfs /coepp/cephfs
   ==12123== Parent PID: 11992
   ==12123==
   ==12123==
   ==12123== HEAP SUMMARY:
   ==12123== in use at exit: 29,129 bytes in 397 blocks
   ==12123==   total heap usage: 14,824 allocs, 14,427 frees, 648,030
   bytes allocated
   ==12123==
   ==12123== LEAK SUMMARY:
   ==12123==definitely lost: 16 bytes in 1 blocks
   ==12123==indirectly lost: 0 bytes in 0 blocks
   ==12123==  possibly lost: 11,705 bytes in 273 blocks
   ==12123==still reachable: 17,408 bytes in 123 blocks
   ==12123== suppressed: 0 bytes in 0 blocks
   ==12123== Rerun with --leak-check=full to see details of leaked memory
   ==12123==
   ==12123== For counts of detected and suppressed errors, rerun with: -v
   ==12123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 6)
   ==12126==
   ==12126== HEAP SUMMARY:
   ==12126== in use at exit: 9,641 bytes in 73 blocks
   ==12126==   total heap usage: 31,363,579 allocs, 31,363,506 frees,
   41,389,143,617 bytes allocated
   ==12126==
   ==12126== LEAK SUMMARY:
   ==12126==definitely lost: 28 bytes in 1 blocks
   ==12126==indirectly lost: 0 bytes in 0 blocks
   ==12126==  possibly lost: 0 bytes in 0 blocks
   ==12126==still reachable: 9,613 bytes in 72 blocks
   ==12126== suppressed: 0 bytes in 0 blocks
   ==12126== Rerun with --leak-check=full to see details of leaked memory
   ==12126==
   ==12126== For counts of detected and suppressed errors, rerun with: -v
   ==12126== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17
   from 9)

--- * ---

2.>  Inspect inode count (ceph daemon  status) to see if 
it's obeying its limit


This is the output I get once ceph-fuse is mounted but no user 
application is running


# ceph daemon /var/run/ceph/ceph-client.mount_user.asok status
{
"metadata": {
"ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",
"ceph_version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)",

"entity_id": "mount_user",
"hostname": "",
"mount_point": "\/coepp\/cephfs",
"root": "\/cephfs"
},
"dentry_count": 0,
"dentry_pinned_count": 0,
"inode_count": 2,
"mds_epoch": 817,
"osd_epoch": 1005,
"osd_epoch_barrier": 0
}


This is already when ceph-fuse reached 10g of virtual memory, and user 
applications are hammering the filesystem.


# ceph daemon /var/run/ceph/ceph-client.mount_user.asok status
{
"metadata": {
"ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread John Spray

On Fri, Jul 8, 2016 at 8:01 AM, Goncalo Borges
 wrote:
> Hi Brad, Patrick, All...
>
> I think I've understood this second problem. In summary, it is memory
> related.
>
> This is how I found the source of the problem:
>
> 1./ I copied and adapted the user application to run in another cluster of
> ours. The idea was for me to understand the application and run it myself to
> collect logs and so on...
>
> 2./ Once I submit it to this other cluster, every thing went fine. I was
> hammering cephfs from multiple nodes without problems. This pointed to
> something different between the two clusters.
>
> 3./ I've started to look better to the segmentation fault message, and
> assuming that the names of the methods and functions do mean something, the
> log seems related to issues on the management of objects in cache. This
> pointed to a memory related problem.
>
> 4./ On the cluster where the application run successfully, machines have
> 48GB of RAM and 96GB of SWAP (don't know why we have such a large SWAP size,
> it is a legacy setup).
>
> # top
> top - 00:34:01 up 23 days, 22:21,  1 user,  load average: 12.06, 12.12,
> 10.40
> Tasks: 683 total,  13 running, 670 sleeping,   0 stopped,   0 zombie
> Cpu(s): 49.7%us,  0.6%sy,  0.0%ni, 49.7%id,  0.1%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  49409308k total, 29692548k used, 19716760k free,   433064k buffers
> Swap: 98301948k total,0k used, 98301948k free, 26742484k cached
>
> 5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB of
> virtual memory when there is no applications using the filesystem.
>
>  7152 root  20   0 1108m  12m 5496 S  0.0  0.0   0:00.04 ceph-fuse
>
> When I only have one instance of the user application running, ceph-fuse (in
> 10.2.2) slowly rises with time up to 10 GB of memory usage.
>
> if I submit a large number of user applications simultaneously, ceph-fuse
> goes very fast to ~10GB.
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 18563 root  20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00 ceph-fuse
>  4343 root  20   0 3131m 237m  12m S  0.0  0.5  28:24.56 dsm_om_connsvcd
>  5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46 python
> 31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88 python
> 20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29 python
> 20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20 python
>  4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70 python
>  1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72 python
> 20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46 python
> 20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37 python
> 28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52 python
> 20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09 python
> 20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42 python
> 20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32 python
>
> 6./ On the machines where the user had the segfault, we have 16 GB of RAM
> and 1GB of SWAP
>
> Mem:  16334244k total,  3590100k used, 12744144k free,   221364k buffers
> Swap:  1572860k total,10512k used,  1562348k free,  2937276k cached
>
> 7./ I think what is happening is that once the user submits his sets of
> jobs, the memory usage goes to the very limit on this type machine, and the
> raise is actually to fast that ceph-fuse segfaults before OOM Killer can
> kill it.
>
> 8./ We have run the user application in the same type of machines but with
> 64 GB of RAM and 1GB of SWAP, and everything goes fine also here.
>
>
> So, in conclusion, our second problem (besides the locks which was fixed by
> Pat patch) is the memory usage profile of ceph-fuse in 10.2.2 which seems to
> be very different than what it was in ceph-fuse 9.2.0.
>
> Are there any ideas how can we limit the virtual memory usage of ceph-fuse
> in 10.2.2?

The fuse client is designed to limit its cache sizes:
client_cache_size (default 16384) inodes of cached metadata
client_oc_size (default 200MB) bytes of cached data

We do run the fuse client with valgrind during testing, so it it is
showing memory leaks in normal usage on your system then that's news.

The top output you've posted seems to show that ceph-fuse only
actually has 328MB resident though?

If you can reproduce the memory growth, then it would be good to:
 * Try running ceph-fuse with valgrind --tool=memcheck to see if it's leaking
 * Inspect inode count (ceph daemon  status) to see if
it's obeying its limit
 * Enable objectcacher debug (debug objectcacher = 10) and look at the
output (from the "trim" lines) to see if it's obeying its limit
 * See if fuse_disable_pagecache setting makes a difference

Also, is the version of fuse the same on the nodes running 9.2.0 vs.
the nodes running 10.2.2?

John

> Cheers
> Goncalo
>
>
>
> On 07/08/2016 09:54 AM, Brad Hubbard wrote:
>
> Hi Goncalo,
>
> If possible it would be great

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread Goncalo Borges


Hi Brad, Patrick, All...

I think I've understood this second problem. In summary, it is memory 
related.


This is how I found the source of the problem:

   1./ I copied and adapted the user application to run in another
   cluster of ours. The idea was for me to understand the application
   and run it myself to collect logs and so on...

   2./ Once I submit it to this other cluster, every thing went fine. I
   was hammering cephfs from multiple nodes without problems. This
   pointed to something different between the two clusters.

   3./ I've started to look better to the segmentation fault message,
   and assuming that the names of the methods and functions do mean
   something, the log seems related to issues on the management of
   objects in cache. This pointed to a memory related problem.

   4./ On the cluster where the application run successfully, machines
   have 48GB of RAM and 96GB of SWAP (don't know why we have such a
   large SWAP size, it is a legacy setup).

   # top
   top - 00:34:01 up 23 days, 22:21,  1 user,  load average: 12.06,
   12.12, 10.40
   Tasks: 683 total,  13 running, 670 sleeping,   0 stopped,   0 zombie
   Cpu(s): 49.7%us,  0.6%sy,  0.0%ni, 49.7%id,  0.1%wa,  0.0%hi,
   0.0%si,  0.0%st
   Mem:  49409308k total, 29692548k used, 19716760k free, 433064k
   buffers
   Swap: 98301948k total,0k used, 98301948k free, 26742484k
   cached

   5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB
   of virtual memory when there is no applications using the filesystem.

 7152 root  20   0 1108m  12m 5496 S  0.0  0.0   0:00.04
   ceph-fuse

   When I only have one instance of the user application running,
   ceph-fuse (in 10.2.2) slowly rises with time up to 10 GB of memory
   usage.

   if I submit a large number of user applications simultaneously,
   ceph-fuse goes very fast to ~10GB.

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+ COMMAND
   18563 root  20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00
   ceph-fuse
 4343 root  20   0 3131m 237m  12m S  0.0  0.5  28:24.56
   dsm_om_connsvcd
 5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46
   python
   31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88 python
   20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29
   python
   20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20 python
 4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70
   python
 1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72
   python
   20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46 python
   20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37 python
   28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52 python
   20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09
   python
   20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42 python
   20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32 python

   6./ On the machines where the user had the segfault, we have 16 GB
   of RAM and 1GB of SWAP

   Mem:  16334244k total,  3590100k used, 12744144k free, 221364k
   buffers
   Swap:  1572860k total,10512k used,  1562348k free, 2937276k
   cached

   7./ I think what is happening is that once the user submits his sets
   of jobs, the memory usage goes to the very limit on this type
   machine, and the raise is actually to fast that ceph-fuse segfaults
   before OOM Killer can kill it.

   8./ We have run the user application in the same type of machines
   but with 64 GB of RAM and 1GB of SWAP, and everything goes fine also
   here.


So, in conclusion, our second problem (besides the locks which was fixed 
by Pat patch) is the memory usage profile of ceph-fuse in 10.2.2 which 
seems to be very different than what it was in ceph-fuse 9.2.0.


Are there any ideas how can we limit the virtual memory usage of 
ceph-fuse in 10.2.2?


Cheers
Goncalo



On 07/08/2016 09:54 AM, Brad Hubbard wrote:

Hi Goncalo,

If possible it would be great if you could capture a core file for this with
full debugging symbols (preferably glibc debuginfo as well). How you do
that will depend on the ceph version and your OS but we can offfer help
if required I'm sure.

Once you have the core do the following.

$ gdb /path/to/ceph-fuse core.
(gdb) set pag off
(gdb) set log on
(gdb) thread apply all bt
(gdb) thread apply all bt full

Then quit gdb and you should find a file called gdb.txt in your
working directory.
If you could attach that file to http://tracker.ceph.com/issues/16610

Cheers,
Brad

On Fri, Jul 8, 2016 at 12:06 AM, Patrick Donnelly  wrote:

On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
 wrote:

Unfortunately, the other user application breaks ceph-fuse again (It is a
completely

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Brad Hubbard

Hi Goncalo,

If possible it would be great if you could capture a core file for this with
full debugging symbols (preferably glibc debuginfo as well). How you do
that will depend on the ceph version and your OS but we can offfer help
if required I'm sure.

Once you have the core do the following.

$ gdb /path/to/ceph-fuse core.
(gdb) set pag off
(gdb) set log on
(gdb) thread apply all bt
(gdb) thread apply all bt full

Then quit gdb and you should find a file called gdb.txt in your
working directory.
If you could attach that file to http://tracker.ceph.com/issues/16610

Cheers,
Brad

On Fri, Jul 8, 2016 at 12:06 AM, Patrick Donnelly  wrote:
> On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
>  wrote:
>> Unfortunately, the other user application breaks ceph-fuse again (It is a
>> completely different application then in my previous test).
>>
>> We have tested it in 4 machines with 4 cores. The user is submitting 16
>> single core jobs which are all writing different output files (one per job)
>> to a common dir in cephfs. The first 4 jobs run happily and never break
>> ceph-fuse. But the remaining 12 jobs, running in the remaining 3 machines,
>> trigger a segmentation fault, which is completely different from the other
>> case.
>>
>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>> 1: (()+0x297fe2) [0x7f54402b7fe2]
>> 2: (()+0xf7e0) [0x7f543ecf77e0]
>> 3: (ObjectCacher::bh_write_scattered(std::list> std::allocator >&)+0x36) [0x7f5440268086]
>> 4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
>> std::chrono::time_point> std::chrono::duration > >, long*,
>> int*)+0x22c) [0x7f5440268a3c]
>> 5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
>> 6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
>> 7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
>> 8: (()+0x7aa1) [0x7f543ecefaa1]
>>  9: (clone()+0x6d) [0x7f543df6893d]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
>> interpret this.
>
> This one looks like a very different problem. I've created an issue
> here: http://tracker.ceph.com/issues/16610
>
> Thanks for the report and debug log!
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Patrick Donnelly

On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
 wrote:
> Unfortunately, the other user application breaks ceph-fuse again (It is a
> completely different application then in my previous test).
>
> We have tested it in 4 machines with 4 cores. The user is submitting 16
> single core jobs which are all writing different output files (one per job)
> to a common dir in cephfs. The first 4 jobs run happily and never break
> ceph-fuse. But the remaining 12 jobs, running in the remaining 3 machines,
> trigger a segmentation fault, which is completely different from the other
> case.
>
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 1: (()+0x297fe2) [0x7f54402b7fe2]
> 2: (()+0xf7e0) [0x7f543ecf77e0]
> 3: (ObjectCacher::bh_write_scattered(std::list std::allocator >&)+0x36) [0x7f5440268086]
> 4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
> std::chrono::time_point std::chrono::duration > >, long*,
> int*)+0x22c) [0x7f5440268a3c]
> 5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
> 6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
> 7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
> 8: (()+0x7aa1) [0x7f543ecefaa1]
>  9: (clone()+0x6d) [0x7f543df6893d]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.

This one looks like a very different problem. I've created an issue
here: http://tracker.ceph.com/issues/16610

Thanks for the report and debug log!

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Goncalo Borges

My previous email did not go through because of its size. Here goes a 
new attempt:


Cheers
Goncalo

--- * ---

Hi Patrick, Brad...

Unfortunately, the other user application breaks ceph-fuse again (It is 
a completely different application then in my previous test).


We have tested it in 4 machines with 4 cores. The user is submitting 16 
single core jobs which are all writing different output files (one per 
job) to a common dir in cephfs. The first 4 jobs run happily and never 
break ceph-fuse. But the remaining 12 jobs, running in the remaining 3 
machines, trigger a segmentation fault, which is completely different 
from the other case.


ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x297fe2) [0x7f54402b7fe2]
2: (()+0xf7e0) [0x7f543ecf77e0]
3: 
(ObjectCacher::bh_write_scattered(std::list >&)+0x36) [0x7f5440268086]
4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*, 
std::chrono::time_point >, 
long*, int*)+0x22c) [0x7f5440268a3c]

5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
8: (()+0x7aa1) [0x7f543ecefaa1]
 9: (clone()+0x6d) [0x7f543df6893d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed 
to interpret this.


The full log (with debug client = 20) for a segfault in client with IP  
Y.Y.Y.255 is available here


https://dl.dropboxusercontent.com/u/2946024/nohup.out.2

(for privacy issues, I've substituted clients IPs for Y.Y.Y.(...) and 
ceph infrastructure hosts ips for X.X.X.(...) )


Welll... further help is welcomed.

Cheers
Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Brad Hubbard

On Thu, Jul 7, 2016 at 12:31 AM, Patrick Donnelly  wrote:
>
> The locks were missing in 9.2.0. There were probably instances of the
> segfault unreported/unresolved.

Or even unseen :)

Race conditions are funny things and extremely subtle changes in
timing introduced
by any number of things can affect whether they happen or not. I've
seen races that
only happen on certain CPUs and not others, or that don't happen
unless a particular
flag is on/off during compilation. Difficult to predict.

>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Patrick Donnelly

Hi Goncalo,

On Wed, Jul 6, 2016 at 2:18 AM, Goncalo Borges
 wrote:
> Just to confirm that, after applying the patch and recompiling, we are no
> longer seeing segfaults.
>
> I just tested with a user application which would kill ceph-fuse almost
> instantaneously.  Now it is running for quite some time, reading and
> updating the files that it should.
>
> I should test with other applications which were also triggering the
> ceph-fuse segfault, but for now, it is looking good.

Great, thanks for letting us know it worked.

> Is there a particular reason why in 9.2.0 we were not getting such
> segfaults? I am asking because the patch was simply to introduce two lock
> functions in two specific lines of src/client/Client.cc  which, I imagine,
> were also not there in 9.2.0 (unless there was a big rewrite of
> src/client/Client.cc from 9.2.0 to 10.2.2)

The locks were missing in 9.2.0. There were probably instances of the
segfault unreported/unresolved.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Goncalo Borges


Hi All...

Just to confirm that, after applying the patch and recompiling, we are 
no longer seeing segfaults.


I just tested with a user application which would kill ceph-fuse almost 
instantaneously.  Now it is running for quite some time, reading and 
updating the files that it should.


I should test with other applications which were also triggering the 
ceph-fuse segfault, but for now, it is looking good.


Thanks Patrick for pointing this out.

Is there a particular reason why in 9.2.0 we were not getting such 
segfaults? I am asking because the patch was simply to introduce two 
lock functions in two specific lines of src/client/Client.cc  which, I 
imagine, were also not there in 9.2.0 (unless there was a big rewrite of 
src/client/Client.cc from 9.2.0 to 10.2.2)


Cheers

Goncalo


On 07/05/2016 02:45 PM, Goncalo Borges wrote:


Hi Brad, Shinobu, Patrick...

Indeed if I run with 'debug client = 20' it seems I get a very similar 
log to what Patrick has in the patch. However it is difficult for me 
to really say if it is exactly the same thing.


One thing I could try is simply to apply the fix in the source code 
and recompile. Is this something safe to do?



Cheers

Goncalo


On 07/05/2016 01:34 PM, Patrick Donnelly wrote:


Hi Goncalo,

I believe this segfault may be the one fixed here:

https://github.com/ceph/ceph/pull/10027

(Sorry for brief top-post. Im on mobile.)

On Jul 4, 2016 9:16 PM, "Goncalo Borges" 
 wrote:

>
> Dear All...
>
> We have recently migrated all our ceph infrastructure from 9.2.0 to 
10.2.2.

>
> We are currently using ceph-fuse to mount cephfs in a number of 
clients.

>
> ceph-fuse 10.2.2 client is segfaulting in some situations. One of 
the scenarios where ceph-fuse segfaults is when a user submits a 
parallel (mpi) application requesting 4 hosts with 4 cores each (16 
instances in total) . According to the user, each instance has its 
own dedicated inputs and outputs.

>
> Please note that if we go back to ceph-fuse 9.2.0 client everything 
works fine.

>
> The ceph-fuse 10.2.2 client segfault is the following (we were able 
to capture it mounting ceph-fuse in debug mode):

>>
>> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346

>> ceph-fuse[7346]: starting ceph client
>> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 
0x7f6af8c12320 newargc=11

>> ceph-fuse[7346]: starting fuse
>> *** Caught signal (Segmentation fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175) 
[0x7f6aedaee035]

>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal 
(Segmentation fault) **

>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175) 
[0x7f6aedaee035]

>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.

>>
>>
> The full dump is quite long. Here are the very last bits of it. Let 
me know if you need the full dump.

>>
>> --- begin dump of recent events ---
>>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559 
_getxattr(137c789, "security.capability", 0) = -61
>>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 
ll_write 0x7f6a08028be0 137c78c 20094~34
>>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 
ll_write 0x7f6a08028be0 20094~34 = 34
>>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 
ll_write 0x7f6a100145f0 137c78d 28526~34
>>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 
ll_write 0x7f6a100145f0 28526~34 = 34
>>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559 
ll_forget 137c78c 1
>>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559 
ll_forget 137c789 1
>>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 
ll_write 0x7f6a94006350 137c789 22010~216
>>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 
ll_write 0x7f6a94006350 22010~216 = 216
>>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559 
ll_getxattr

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges


Will do Brad. From you answer it should be a safe thing to do.

Will report later.

Thanks for the help

Cheers

Goncalo



On 07/05/2016 02:42 PM, Brad Hubbard wrote:

On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly  wrote:

Hi Goncalo,

I believe this segfault may be the one fixed here:

https://github.com/ceph/ceph/pull/10027

Ah, nice one Patrick.

Goncalo, the patch is fairly simple, just the addition of a lock on two lines to
resolve the race. Could you try recompiling with those changes and let
us know how
it goes?

Cheers,
Brad


(Sorry for brief top-post. Im on mobile.)

On Jul 4, 2016 9:16 PM, "Goncalo Borges" 
wrote:

Dear All...

We have recently migrated all our ceph infrastructure from 9.2.0 to
10.2.2.

We are currently using ceph-fuse to mount cephfs in a number of clients.

ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
application requesting 4 hosts with 4 cores each (16 instances in total) .
According to the user, each instance has its own dedicated inputs and
outputs.

Please note that if we go back to ceph-fuse 9.2.0 client everything works
fine.

The ceph-fuse 10.2.2 client segfault is the following (we were able to
capture it mounting ceph-fuse in debug mode):

2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
ceph-fuse[7346]: starting ceph client
2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 0x7f6af8c12320
newargc=11
ceph-fuse[7346]: starting fuse
*** Caught signal (Segmentation fault) **
  in thread 7f69d7fff700 thread_name:ceph-fuse
  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
  1: (()+0x297ef2) [0x7f6aedbecef2]
  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
[0x7f6aedaee035]
  5: (()+0x199891) [0x7f6aedaee891]
  6: (()+0x15b76) [0x7f6aed50db76]
  7: (()+0x12aa9) [0x7f6aed50aaa9]
  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
  9: (clone()+0x6d) [0x7f6aeb8d193d]
2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal
(Segmentation fault) **
  in thread 7f69d7fff700 thread_name:ceph-fuse

  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
  1: (()+0x297ef2) [0x7f6aedbecef2]
  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
[0x7f6aedaee035]
  5: (()+0x199891) [0x7f6aedaee891]
  6: (()+0x15b76) [0x7f6aed50db76]
  7: (()+0x12aa9) [0x7f6aed50aaa9]
  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
  9: (clone()+0x6d) [0x7f6aeb8d193d]
  NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.



The full dump is quite long. Here are the very last bits of it. Let me
know if you need the full dump.

--- begin dump of recent events ---
  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
_getxattr(137c789, "security.capability", 0) = -61
  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 ll_write
0x7f6a08028be0 137c78c 20094~34
  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 ll_write
0x7f6a08028be0 20094~34 = 34
  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 ll_write
0x7f6a100145f0 137c78d 28526~34
  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 ll_write
0x7f6a100145f0 28526~34 = 34
  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559
ll_forget 137c78c 1
  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559
ll_forget 137c789 1
  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 ll_write
0x7f6a94006350 137c789 22010~216
  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 ll_write
0x7f6a94006350 22010~216 = 216
  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
ll_getxattr 137c78c.head security.capability size 0
  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
_getxattr(137c78c, "security.capability", 0) = -61



   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
_getxattr(137c78a, "security.capability", 0) = -61
   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 ll_write
0x7f6a08042560 137c78b 11900~34
   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 ll_write
0x7f6a08042560 11900~34 = 34
   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
ll_getattr 11e9c80.head
   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
ll_getattr 11e9c80.head = 0
   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559
ll_forget 137c78a 1
   -154> 2016-07-05 10:09:14.043738 7f6a5ebfd700  3 client.464559 ll_write
0x7f6a140d5930 137c78a 18292~34
   -153> 2016-07-05 10:09:14.043759 7f6a5ebfd700  3 client.464559

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges


Hi Brad, Shinobu, Patrick...

Indeed if I run with 'debug client = 20' it seems I get a very similar 
log to what Patrick has in the patch. However it is difficult for me to 
really say if it is exactly the same thing.


One thing I could try is simply to apply the fix in the source code and 
recompile. Is this something safe to do?



Cheers

Goncalo


On 07/05/2016 01:34 PM, Patrick Donnelly wrote:


Hi Goncalo,

I believe this segfault may be the one fixed here:

https://github.com/ceph/ceph/pull/10027

(Sorry for brief top-post. Im on mobile.)

On Jul 4, 2016 9:16 PM, "Goncalo Borges" > wrote:

>
> Dear All...
>
> We have recently migrated all our ceph infrastructure from 9.2.0 to 
10.2.2.

>
> We are currently using ceph-fuse to mount cephfs in a number of 
clients.

>
> ceph-fuse 10.2.2 client is segfaulting in some situations. One of 
the scenarios where ceph-fuse segfaults is when a user submits a 
parallel (mpi) application requesting 4 hosts with 4 cores each (16 
instances in total) . According to the user, each instance has its own 
dedicated inputs and outputs.

>
> Please note that if we go back to ceph-fuse 9.2.0 client everything 
works fine.

>
> The ceph-fuse 10.2.2 client segfault is the following (we were able 
to capture it mounting ceph-fuse in debug mode):

>>
>> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346

>> ceph-fuse[7346]: starting ceph client
>> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 
0x7f6af8c12320 newargc=11

>> ceph-fuse[7346]: starting fuse
>> *** Caught signal (Segmentation fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175) 
[0x7f6aedaee035]

>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal 
(Segmentation fault) **

>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175) 
[0x7f6aedaee035]

>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.

>>
>>
> The full dump is quite long. Here are the very last bits of it. Let 
me know if you need the full dump.

>>
>> --- begin dump of recent events ---
>>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559 
_getxattr(137c789, "security.capability", 0) = -61
>>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 
ll_write 0x7f6a08028be0 137c78c 20094~34
>>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 
ll_write 0x7f6a08028be0 20094~34 = 34
>>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 
ll_write 0x7f6a100145f0 137c78d 28526~34
>>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 
ll_write 0x7f6a100145f0 28526~34 = 34
>>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559 
ll_forget 137c78c 1
>>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559 
ll_forget 137c789 1
>>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 
ll_write 0x7f6a94006350 137c789 22010~216
>>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 
ll_write 0x7f6a94006350 22010~216 = 216
>>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559 
ll_getxattr 137c78c.head security.capability size 0
>>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559 
_getxattr(137c78c, "security.capability", 0) = -61

>>
>> 
>>
>>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559 
_getxattr(137c78a, "security.capability", 0) = -61
>>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 
ll_write 0x7f6a08042560 137c78b 11900~34
>>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 
ll_write 0x7f6a08042560 11900~34 = 34
>>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559 
ll_getattr 11e9c80.head
>>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559 
ll_getattr 11e9c80.head = 0
>>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559 
ll_forget 137c78a 1
>>

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard

On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly  wrote:
> Hi Goncalo,
>
> I believe this segfault may be the one fixed here:
>
> https://github.com/ceph/ceph/pull/10027

Ah, nice one Patrick.

Goncalo, the patch is fairly simple, just the addition of a lock on two lines to
resolve the race. Could you try recompiling with those changes and let
us know how
it goes?

Cheers,
Brad

>
> (Sorry for brief top-post. Im on mobile.)
>
> On Jul 4, 2016 9:16 PM, "Goncalo Borges" 
> wrote:
>>
>> Dear All...
>>
>> We have recently migrated all our ceph infrastructure from 9.2.0 to
>> 10.2.2.
>>
>> We are currently using ceph-fuse to mount cephfs in a number of clients.
>>
>> ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
>> scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
>> application requesting 4 hosts with 4 cores each (16 instances in total) .
>> According to the user, each instance has its own dedicated inputs and
>> outputs.
>>
>> Please note that if we go back to ceph-fuse 9.2.0 client everything works
>> fine.
>>
>> The ceph-fuse 10.2.2 client segfault is the following (we were able to
>> capture it mounting ceph-fuse in debug mode):
>>>
>>> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
>>> (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
>>> ceph-fuse[7346]: starting ceph client
>>> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 0x7f6af8c12320
>>> newargc=11
>>> ceph-fuse[7346]: starting fuse
>>> *** Caught signal (Segmentation fault) **
>>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>>> [0x7f6aedaee035]
>>>  5: (()+0x199891) [0x7f6aedaee891]
>>>  6: (()+0x15b76) [0x7f6aed50db76]
>>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal
>>> (Segmentation fault) **
>>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>>
>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>>> [0x7f6aedaee035]
>>>  5: (()+0x199891) [0x7f6aedaee891]
>>>  6: (()+0x15b76) [0x7f6aed50db76]
>>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
>>> to interpret this.
>>>
>>>
>> The full dump is quite long. Here are the very last bits of it. Let me
>> know if you need the full dump.
>>>
>>> --- begin dump of recent events ---
>>>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
>>> _getxattr(137c789, "security.capability", 0) = -61
>>>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 ll_write
>>> 0x7f6a08028be0 137c78c 20094~34
>>>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 ll_write
>>> 0x7f6a08028be0 20094~34 = 34
>>>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 ll_write
>>> 0x7f6a100145f0 137c78d 28526~34
>>>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 ll_write
>>> 0x7f6a100145f0 28526~34 = 34
>>>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559
>>> ll_forget 137c78c 1
>>>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559
>>> ll_forget 137c789 1
>>>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 ll_write
>>> 0x7f6a94006350 137c789 22010~216
>>>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 ll_write
>>> 0x7f6a94006350 22010~216 = 216
>>>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
>>> ll_getxattr 137c78c.head security.capability size 0
>>>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
>>> _getxattr(137c78c, "security.capability", 0) = -61
>>>
>>> 
>>>
>>>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
>>> _getxattr(137c78a, "security.capability", 0) = -61
>>>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 ll_write
>>> 0x7f6a08042560 137c78b 11900~34
>>>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 ll_write
>>> 0x7f6a08042560 11900~34 = 34
>>>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
>>> ll_getattr 11e9c80.head
>>>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
>>> ll_getattr 11e9c80.head = 0
>>>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559
>>> ll_forget 137c78a 1
>>>   -154>

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Patrick Donnelly

Hi Goncalo,

I believe this segfault may be the one fixed here:

https://github.com/ceph/ceph/pull/10027

(Sorry for brief top-post. Im on mobile.)

On Jul 4, 2016 9:16 PM, "Goncalo Borges" 
wrote:
>
> Dear All...
>
> We have recently migrated all our ceph infrastructure from 9.2.0 to
10.2.2.
>
> We are currently using ceph-fuse to mount cephfs in a number of clients.
>
> ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
application requesting 4 hosts with 4 cores each (16 instances in total) .
According to the user, each instance has its own dedicated inputs and
outputs.
>
> Please note that if we go back to ceph-fuse 9.2.0 client everything works
fine.
>
> The ceph-fuse 10.2.2 client segfault is the following (we were able to
capture it mounting ceph-fuse in debug mode):
>>
>> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
>> ceph-fuse[7346]: starting ceph client
>> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv =
0x7f6af8c12320 newargc=11
>> ceph-fuse[7346]: starting fuse
>> *** Caught signal (Segmentation fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
[0x7f6aedaee035]
>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal
(Segmentation fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
[0x7f6aedaee035]
>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
>>
>>
> The full dump is quite long. Here are the very last bits of it. Let me
know if you need the full dump.
>>
>> --- begin dump of recent events ---
>>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
_getxattr(137c789, "security.capability", 0) = -61
>>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559
ll_write 0x7f6a08028be0 137c78c 20094~34
>>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559
ll_write 0x7f6a08028be0 20094~34 = 34
>>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559
ll_write 0x7f6a100145f0 137c78d 28526~34
>>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559
ll_write 0x7f6a100145f0 28526~34 = 34
>>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559
ll_forget 137c78c 1
>>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559
ll_forget 137c789 1
>>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559
ll_write 0x7f6a94006350 137c789 22010~216
>>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559
ll_write 0x7f6a94006350 22010~216 = 216
>>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
ll_getxattr 137c78c.head security.capability size 0
>>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
_getxattr(137c78c, "security.capability", 0) = -61
>>
>> 
>>
>>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
_getxattr(137c78a, "security.capability", 0) = -61
>>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559
ll_write 0x7f6a08042560 137c78b 11900~34
>>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559
ll_write 0x7f6a08042560 11900~34 = 34
>>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
ll_getattr 11e9c80.head
>>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
ll_getattr 11e9c80.head = 0
>>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559
ll_forget 137c78a 1
>>   -154> 2016-07-05 10:09:14.043738 7f6a5ebfd700  3 client.464559
ll_write 0x7f6a140d5930 137c78a 18292~34
>>   -153> 2016-07-05 10:09:14.043759 7f6a5ebfd700  3 client.464559
ll_write 0x7f6a140d5930 18292~34 = 34
>>   -152> 2016-07-05 10:09:14.043767 7f6ac17fb700  3 client.464559
ll_forget 11e9c80 1
>>   -151> 2016-07-05 10:09:14.043784 7f6aa8cf9700  3 client.464559
ll_flush 0x7f6a00049fe0 11e9c80
>>   -150> 2016-07-05 10:09:14.043794 7f6aa8cf9700  3 client.464559
ll_getxattr

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard

On Tue, Jul 5, 2016 at 12:13 PM, Shinobu Kinjo  wrote:
> Can you reproduce with debug client = 20?

In addition to this I would suggest making sure you have debug symbols
in your build
and capturing a core file.

You can do that by setting "ulimit -c unlimited" in the environment
where ceph-fuse is running.

Once you have a core file you can do the following.

$ gdb /path/to/ceph-fuse core.
(gdb) thread apply all bt full

This looks like it might be a race and that might help us identify the
threads involved.

HTH,
Brad

>
> On Tue, Jul 5, 2016 at 10:16 AM, Goncalo Borges
>  wrote:
>>
>> Dear All...
>>
>> We have recently migrated all our ceph infrastructure from 9.2.0 to
>> 10.2.2.
>>
>> We are currently using ceph-fuse to mount cephfs in a number of clients.
>>
>> ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
>> scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
>> application requesting 4 hosts with 4 cores each (16 instances in total) .
>> According to the user, each instance has its own dedicated inputs and
>> outputs.
>>
>> Please note that if we go back to ceph-fuse 9.2.0 client everything works
>> fine.
>>
>> The ceph-fuse 10.2.2 client segfault is the following (we were able to
>> capture it mounting ceph-fuse in debug mode):
>>
>> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
>> (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
>> ceph-fuse[7346]: starting ceph client
>> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 0x7f6af8c12320
>> newargc=11
>> ceph-fuse[7346]: starting fuse
>> *** Caught signal (Segmentation fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>> [0x7f6aedaee035]
>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal (Segmentation
>> fault) **
>>  in thread 7f69d7fff700 thread_name:ceph-fuse
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x297ef2) [0x7f6aedbecef2]
>>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>> [0x7f6aedaee035]
>>  5: (()+0x199891) [0x7f6aedaee891]
>>  6: (()+0x15b76) [0x7f6aed50db76]
>>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
>> to interpret this.
>>
>>
>> The full dump is quite long. Here are the very last bits of it. Let me
>> know if you need the full dump.
>>
>> --- begin dump of recent events ---
>>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
>> _getxattr(137c789, "security.capability", 0) = -61
>>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 ll_write
>> 0x7f6a08028be0 137c78c 20094~34
>>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 ll_write
>> 0x7f6a08028be0 20094~34 = 34
>>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 ll_write
>> 0x7f6a100145f0 137c78d 28526~34
>>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 ll_write
>> 0x7f6a100145f0 28526~34 = 34
>>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559 ll_forget
>> 137c78c 1
>>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559 ll_forget
>> 137c789 1
>>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 ll_write
>> 0x7f6a94006350 137c789 22010~216
>>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 ll_write
>> 0x7f6a94006350 22010~216 = 216
>>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
>> ll_getxattr 137c78c.head security.capability size 0
>>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
>> _getxattr(137c78c, "security.capability", 0) = -61
>>
>> 
>>
>>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
>> _getxattr(137c78a, "security.capability", 0) = -61
>>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 ll_write
>> 0x7f6a08042560 137c78b 11900~34
>>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 ll_write
>> 0x7f6a08042560 11900~34 = 34
>>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
>> ll_getattr 11e9c80.head
>>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
>> ll_getattr 11e9c80.head = 0
>>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Shinobu Kinjo

Can you reproduce with debug client = 20?

On Tue, Jul 5, 2016 at 10:16 AM, Goncalo Borges <
goncalo.bor...@sydney.edu.au> wrote:

> Dear All...
>
> We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2.
>
> We are currently using ceph-fuse to mount cephfs in a number of clients.
>
> ceph-fuse 10.2.2 client is segfaulting in some situations. One of the
> scenarios where ceph-fuse segfaults is when a user submits a parallel (mpi)
> application requesting 4 hosts with 4 cores each (16 instances in total) .
> According to the user, each instance has its own dedicated inputs and
> outputs.
>
> Please note that if we go back to ceph-fuse 9.2.0 client everything works
> fine.
>
> The ceph-fuse 10.2.2 client segfault is the following (we were able to
> capture it mounting ceph-fuse in debug mode):
>
> 2016-07-04 21:21:00.074087 7f6aed92be40  0 ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-fuse, pid 7346
> ceph-fuse[7346]: starting ceph client
> 2016-07-04 21:21:00.107816 7f6aed92be40 -1 init, newargv = 0x7f6af8c12320
> newargc=11
> ceph-fuse[7346]: starting fuse
> *** Caught signal (Segmentation fault) **
>  in thread 7f69d7fff700 thread_name:ceph-fuse
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x297ef2) [0x7f6aedbecef2]
>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
> [0x7f6aedaee035]
>  5: (()+0x199891) [0x7f6aedaee891]
>  6: (()+0x15b76) [0x7f6aed50db76]
>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>  9: (clone()+0x6d) [0x7f6aeb8d193d]
> 2016-07-05 10:09:14.045131 7f69d7fff700 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f69d7fff700 thread_name:ceph-fuse
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x297ef2) [0x7f6aedbecef2]
>  2: (()+0x3b88c0f7e0) [0x7f6aec64b7e0]
>  3: (Client::get_root_ino()+0x10) [0x7f6aedaf0330]
>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
> [0x7f6aedaee035]
>  5: (()+0x199891) [0x7f6aedaee891]
>  6: (()+0x15b76) [0x7f6aed50db76]
>  7: (()+0x12aa9) [0x7f6aed50aaa9]
>  8: (()+0x3b88c07aa1) [0x7f6aec643aa1]
>  9: (clone()+0x6d) [0x7f6aeb8d193d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> The full dump is quite long. Here are the very last bits of it. Let me
> know if you need the full dump.
>
> --- begin dump of recent events ---
>  -> 2016-07-05 10:09:13.956502 7f6a5700  3 client.464559
> _getxattr(137c789, "security.capability", 0) = -61
>  -9998> 2016-07-05 10:09:13.956507 7f6aa96fa700  3 client.464559 ll_write
> 0x7f6a08028be0 137c78c 20094~34
>  -9997> 2016-07-05 10:09:13.956527 7f6aa96fa700  3 client.464559 ll_write
> 0x7f6a08028be0 20094~34 = 34
>  -9996> 2016-07-05 10:09:13.956535 7f69d7fff700  3 client.464559 ll_write
> 0x7f6a100145f0 137c78d 28526~34
>  -9995> 2016-07-05 10:09:13.956553 7f69d7fff700  3 client.464559 ll_write
> 0x7f6a100145f0 28526~34 = 34
>  -9994> 2016-07-05 10:09:13.956561 7f6ac0dfa700  3 client.464559 ll_forget
> 137c78c 1
>  -9993> 2016-07-05 10:09:13.956569 7f6a5700  3 client.464559 ll_forget
> 137c789 1
>  -9992> 2016-07-05 10:09:13.956577 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a94006350 137c789 22010~216
>  -9991> 2016-07-05 10:09:13.956594 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a94006350 22010~216 = 216
>  -9990> 2016-07-05 10:09:13.956603 7f6aa8cf9700  3 client.464559
> ll_getxattr 137c78c.head security.capability size 0
>  -9989> 2016-07-05 10:09:13.956609 7f6aa8cf9700  3 client.464559
> _getxattr(137c78c, "security.capability", 0) = -61
>
> 
>
>   -160> 2016-07-05 10:09:14.043687 7f69d7fff700  3 client.464559
> _getxattr(137c78a, "security.capability", 0) = -61
>   -159> 2016-07-05 10:09:14.043694 7f6ac0dfa700  3 client.464559 ll_write
> 0x7f6a08042560 137c78b 11900~34
>   -158> 2016-07-05 10:09:14.043712 7f6ac0dfa700  3 client.464559 ll_write
> 0x7f6a08042560 11900~34 = 34
>   -157> 2016-07-05 10:09:14.043722 7f6ac17fb700  3 client.464559
> ll_getattr 11e9c80.head
>   -156> 2016-07-05 10:09:14.043727 7f6ac17fb700  3 client.464559
> ll_getattr 11e9c80.head = 0
>   -155> 2016-07-05 10:09:14.043734 7f69d7fff700  3 client.464559 ll_forget
> 137c78a 1
>   -154> 2016-07-05 10:09:14.043738 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a140d5930 137c78a 18292~34
>   -153> 2016-07-05 10:09:14.043759 7f6a5ebfd700  3 client.464559 ll_write
> 0x7f6a140d5930 18292~34 = 34
>   -152> 2016-07-05 10:09:14.043767 7f6ac17fb700  3 client.464559 ll_forget
> 11e9c80 1
>   -151> 2016-07-05 10:09:14.043784 7f6aa8cf9700  3 client.464559 ll_flush
> 0x7f6a00049fe0 11e9c80
>   -150> 2016-07-05 10:09:14.043794 7f6aa8cf9700  3 client.464559
> ll_getxattr 137c78a.head security.capability size 0
>   -149> 2016-07-05 10:09:14.043799 7f6aa8cf9700  3

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

28 matches

Site Navigation

Mail list logo

Footer information