Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-12-16 Thread jesper
> If a CephFS client receive a cap release request and it is able to
> perform it (no processes accessing the file at the moment), the client
> cleaned up its internal state and allows the MDS to release the cap.
> This cleanup also involves removing file data from the page cache.
>
> If your MDS was running with a too small cache size, it had to revoke
> caps over and over to adhere to its cache size, and the clients had to
> cleanup their cache over and over, too.


Well.. It could just mark it "elegible for future cleanup" - if the client
has not use of the available memory, then this is just trashing
local client memory cache for a file that goes into use again in a few
minutes from here. - based on your description, this is what we have
been seeing.

Bumping MDS memory has pushed our problem and our setup works fine, but
above behaviour still seems very "unoptimal" - of course if the file
changes - feel free to active prune - but hey - why actually - the
it will get no hits in the client LRU cache and be automatically
evicted by the client anyway.

I feel this is messing up with thing that has worked well for a few
decades now, but I may just be missing the fine grained details.


> Hope this helps.

Definately - thanks.

-- 
Jesper


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-11-03 Thread Burkhard Linke

Hi,

On 03.11.18 10:31, jes...@krogh.cc wrote:

I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.

In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.


CephFS is a distributed system, so there's a bookkeeping about every 
file in use by any CephFS client. These entities are 'capabilities'; 
they also implement stuff like distributed locking.



The MDS has to cache every capability it has assigned to a CephFS 
client, in addition to the cache for inode information and other stuff. 
The cache size is limited to control the memory consumption of the MDS 
process. If a MDS is running out of cache, it tries to revoke 
capabilities assigned to CephFS clients to free some memory for new 
capabilities. This revoke process runs asynchronous from MDS to CephFS 
client, similar to NFS delegation.



If a CephFS client receive a cap release request and it is able to 
perform it (no processes accessing the file at the moment), the client 
cleaned up its internal state and allows the MDS to release the cap. 
This cleanup also involves removing file data from the page cache.



If your MDS was running with a too small cache size, it had to revoke 
caps over and over to adhere to its cache size, and the clients had to 
cleanup their cache over and over, too.



You did not mention any details about the MDS settings, especially the 
cache size. I assume you increased the cache size after adding more 
memory, since the problem seems to be solved now.



It actually is not solved, but only mitigated. If your working set size 
increases or the number of clients increases, the MDS has to manage more 
caps and will have to revoke caps more often. You will probably reach an 
equilibrium at some point. The MDS is the most memory hungry part of 
Ceph, and it often caught people by surprise. We had the same problem in 
our setup; even worse the nightly backup is also trashing the MDS cache.



The best way to monitor the MDS is using the 'ceph daemonperf mds.XYZ' 
command on the MDS host. It gives you the current performance counters 
including the inode and caps count. Our MDS is configured with a 40 GB 
cache size and currently has 15 million inodes cached and is managing 
3.1 million capabilities.



TL;DR: MDS needs huge amounts of memory for its internal bookkeeping.


Hope this helps.


Regards,

Burkhard





If you can reproduce this issue. please send kernel log to us.

Will do if/when it reappears.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-11-03 Thread jesper
> I suspect that mds asked client to trim its cache. Please run
> following commands on an idle client.

In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.

> If you can reproduce this issue. please send kernel log to us.

Will do if/when it reappears.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-18 Thread Yan, Zheng
On Mon, Oct 15, 2018 at 9:54 PM Dietmar Rieder
 wrote:
>
> On 10/15/18 1:17 PM, jes...@krogh.cc wrote:
> >> On 10/15/18 12:41 PM, Dietmar Rieder wrote:
> >>> No big difference here.
> >>> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64
> >>
> >> ...forgot to mention: all is luminous ceph-12.2.7
> >
> > Thanks for your time in testing, this is very valueable to me in the
> > debugging. 2 questions:
> >
> > Did you "sleep 900" in-between the execution?
> > Are you using the kernel client or the fuse client?
> >
> > If I run them "right after each other" .. then I get the same behaviour.
> >
>
> Hi, as I stated I'm using the kernel client, and yes I did the sleep 900
> between the two runs.
>
> ~Dietmar
>
Sorry for the delay

I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.

time  for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null
bs=1M"; done  | parallel -j 4
echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
sleep 900;
echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
time for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null
bs=1M"; done  | parallel -j 4

If you can reproduce this issue. please send kernel log to us.

Regards
Yan, Zheng


> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 1:17 PM, jes...@krogh.cc wrote:
>> On 10/15/18 12:41 PM, Dietmar Rieder wrote:
>>> No big difference here.
>>> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64
>>
>> ...forgot to mention: all is luminous ceph-12.2.7
> 
> Thanks for your time in testing, this is very valueable to me in the
> debugging. 2 questions:
> 
> Did you "sleep 900" in-between the execution?
> Are you using the kernel client or the fuse client?
> 
> If I run them "right after each other" .. then I get the same behaviour.
> 

Hi, as I stated I'm using the kernel client, and yes I did the sleep 900
between the two runs.

~Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread jesper
> On 10/15/18 12:41 PM, Dietmar Rieder wrote:
>> No big difference here.
>> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64
>
> ...forgot to mention: all is luminous ceph-12.2.7

Thanks for your time in testing, this is very valueable to me in the
debugging. 2 questions:

Did you "sleep 900" in-between the execution?
Are you using the kernel client or the fuse client?

If I run them "right after each other" .. then I get the same behaviour.

-- 
Jesper


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 12:41 PM, Dietmar Rieder wrote:
> On 10/15/18 12:02 PM, jes...@krogh.cc wrote:
 On Sun, Oct 14, 2018 at 8:21 PM  wrote:
 how many cephfs mounts that access the file? Is is possible that some
 program opens that file in RW mode (even they just read the file)?
>>>
>>>
>>> The nature of the program is that it is "prepped" by one-set of commands
>>> and queried by another, thus the RW case is extremely unlikely.
>>> I can change permission bits to rewoke the w-bit for the user, they
>>> dont need it anyway... it is just the same service-users that generates
>>> the data and queries it today.
>>
>> Just to remove the suspicion of other clients fiddling with the files I did a
>> more structured test. I have 4 x 10GB files from fio-benchmarking, total
>> 40GB . Hosted on
>>
>> 1) CephFS /ceph/cluster/home/jk
>> 2) NFS /z/home/jk
>>
>> First I read them .. then sleep 900 seconds .. then read again (just with dd)
>>
>> jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
>> if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
>> for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
>> parallel -j 4
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s
>>
>> real0m3.449s
>> user0m0.217s
>> sys 0m11.497s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s
>>
>> real5m56.634s
>> user0m0.260s
>> sys 0m16.515s
>> jk@sild12:/ceph/cluster/home/jk$
>>
>>
>> Then NFS:
>>
>> jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
>> of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
>> $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
>> -j 4
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s
>>
>> real0m2.855s
>> user0m0.185s
>> sys 0m8.888s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s
>>
>> real0m2.980s
>> user0m0.173s
>> sys 0m8.239s
>> jk@sild12:~$
>>
>>
>> Can I ask one of you to run the same "test" (or similar) .. and report back
>> i you can reproduce it?
> 
> here my test on e EC (6+3) pool using cephfs kernel client:
> 
> 7061+1 records in
> 7061+1 records out
> 7404496985 bytes (7.4 GB) copied, 3.62754 s, 2.0 GB/s
> 7450+1 records in
> 7450+1 records out
> 7812246720 bytes (7.8 GB) copied, 4.11908 s, 1.9 GB/s
> 7761+1 records in
> 7761+1 records out
> 8138636188 bytes (8.1 GB) copied, 4.34788 s, 1.9 GB/s
> 8212+1 records in
> 8212+1 records out
> 8611295220 bytes (8.6 GB) copied, 4.53371 s, 1.9 GB/s
> 
> real0m4.936s
> user0m0.275s
> sys 0m16.828s
> 
> 7061+1 records in
> 7061+1 records out
> 7404496985 bytes (7.4 GB) copied, 3.19726 s, 2.3 GB/s
> 7761+1 records in
> 7761+1 records out
> 8138636188 bytes (8.1 GB) copied, 3.31881 s, 2.5 GB/s
> 7450+1 records in
> 7450+1 records out
> 7812246720 bytes (7.8 GB) copied, 3.36354 s, 2.3 GB/s
> 8212+1 records in
> 8212+1 records out
> 8611295220 bytes (8.6 GB) copied, 3.74418 s, 2.3 GB/s
> 
> 
> No big difference here.
> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64

...forgot to mention: all is luminous ceph-12.2.7

~Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 12:02 PM, jes...@krogh.cc wrote:
>>> On Sun, Oct 14, 2018 at 8:21 PM  wrote:
>>> how many cephfs mounts that access the file? Is is possible that some
>>> program opens that file in RW mode (even they just read the file)?
>>
>>
>> The nature of the program is that it is "prepped" by one-set of commands
>> and queried by another, thus the RW case is extremely unlikely.
>> I can change permission bits to rewoke the w-bit for the user, they
>> dont need it anyway... it is just the same service-users that generates
>> the data and queries it today.
> 
> Just to remove the suspicion of other clients fiddling with the files I did a
> more structured test. I have 4 x 10GB files from fio-benchmarking, total
> 40GB . Hosted on
> 
> 1) CephFS /ceph/cluster/home/jk
> 2) NFS /z/home/jk
> 
> First I read them .. then sleep 900 seconds .. then read again (just with dd)
> 
> jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
> if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
> for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
> parallel -j 4
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s
> 
> real0m3.449s
> user0m0.217s
> sys 0m11.497s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s
> 
> real5m56.634s
> user0m0.260s
> sys 0m16.515s
> jk@sild12:/ceph/cluster/home/jk$
> 
> 
> Then NFS:
> 
> jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
> of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
> $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
> -j 4
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s
> 
> real0m2.855s
> user0m0.185s
> sys 0m8.888s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s
> 
> real0m2.980s
> user0m0.173s
> sys 0m8.239s
> jk@sild12:~$
> 
> 
> Can I ask one of you to run the same "test" (or similar) .. and report back
> i you can reproduce it?

here my test on e EC (6+3) pool using cephfs kernel client:

7061+1 records in
7061+1 records out
7404496985 bytes (7.4 GB) copied, 3.62754 s, 2.0 GB/s
7450+1 records in
7450+1 records out
7812246720 bytes (7.8 GB) copied, 4.11908 s, 1.9 GB/s
7761+1 records in
7761+1 records out
8138636188 bytes (8.1 GB) copied, 4.34788 s, 1.9 GB/s
8212+1 records in
8212+1 records out
8611295220 bytes (8.6 GB) copied, 4.53371 s, 1.9 GB/s

real0m4.936s
user0m0.275s
sys 0m16.828s

7061+1 records in
7061+1 records out
7404496985 bytes (7.4 GB) copied, 3.19726 s, 2.3 GB/s
7761+1 records in
7761+1 records out
8138636188 bytes (8.1 GB) copied, 3.31881 s, 2.5 GB/s
7450+1 records in
7450+1 records out
7812246720 bytes (7.8 GB) copied, 3.36354 s, 2.3 GB/s
8212+1 records in
8212+1 records out
8611295220 bytes (8.6 GB) copied, 3.74418 s, 2.3 GB/s


No big difference here.
all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64

HTH
  Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread jesper
>> On Sun, Oct 14, 2018 at 8:21 PM  wrote:
>> how many cephfs mounts that access the file? Is is possible that some
>> program opens that file in RW mode (even they just read the file)?
>
>
> The nature of the program is that it is "prepped" by one-set of commands
> and queried by another, thus the RW case is extremely unlikely.
> I can change permission bits to rewoke the w-bit for the user, they
> dont need it anyway... it is just the same service-users that generates
> the data and queries it today.

Just to remove the suspicion of other clients fiddling with the files I did a
more structured test. I have 4 x 10GB files from fio-benchmarking, total
40GB . Hosted on

1) CephFS /ceph/cluster/home/jk
2) NFS /z/home/jk

First I read them .. then sleep 900 seconds .. then read again (just with dd)

jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
parallel -j 4
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s

real0m3.449s
user0m0.217s
sys 0m11.497s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s

real5m56.634s
user0m0.260s
sys 0m16.515s
jk@sild12:/ceph/cluster/home/jk$


Then NFS:

jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
$(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
-j 4
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s

real0m2.855s
user0m0.185s
sys 0m8.888s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s

real0m2.980s
user0m0.173s
sys 0m8.239s
jk@sild12:~$


Can I ask one of you to run the same "test" (or similar) .. and report back
i you can reproduce it?

Thoughts/comments/suggestions are highly apprecitated?  Should I try with
the fuse-client ?

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread jesper
> On Sun, Oct 14, 2018 at 8:21 PM  wrote:
> how many cephfs mounts that access the file? Is is possible that some
> program opens that file in RW mode (even they just read the file)?


The nature of the program is that it is "prepped" by one-set of commands
and queried by another, thus the RW case is extremely unlikely.
I can change permission bits to rewoke the w-bit for the user, they
dont need it anyway... it is just the same service-users that generates
the data and queries it today.

Can ceph tell the actual amount of clients? ..
We have 55-60 hosts, where most of them mounts the catalog.

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread Yan, Zheng
On Sun, Oct 14, 2018 at 8:21 PM  wrote:
>
> Hi
>
> We have a dataset of ~300 GB on CephFS which as being used for computations
> over and over agian .. being refreshed daily or similar.
>
> When hosting it on NFS after refresh, they are transferred, but from
> there - they would be sitting in the kernel page cache of the client
> until they are refreshed serverside.
>
> On CephFS it look "similar" but "different". Where the "steady state"
> operation over NFS would give a client/server traffic of < 1MB/s ..
> CephFS contantly pulls 50-100MB/s over the network.  This has
> implications for the clients that end up spending unnessary time waiting
> for IO in the execution.
>
> This is in a setting where the CephFS client mem look like this:
>
> $ free -h
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G340G1.2G 19G
> 354G
> Swap:  8.8G430M8.4G
>
>
> If I just repeatedly run (within a few minute) something that is using the
> files, then
> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
> like
> it is being evicted way faster than in the NFS setting?
>
> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
> type on a total of 24GB data in 300'ish files.
>
> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
> time CMD ;
>
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 16G312G1.2G 48G
> 355G
> Swap:  8.8G430M8.4G
>
> real0m8.997s
> user0m2.036s
> sys 0m6.915s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G277G1.2G 82G
> 354G
> Swap:  8.8G430M8.4G
>
> real3m25.904s
> user0m2.794s
> sys 0m9.028s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G283G1.2G 76G
> 353G
> Swap:  8.8G430M8.4G
>
> real6m18.358s
> user0m2.847s
> sys 0m10.651s
>
>
> Munin graphs of the system confirms that there has been zero memory
> pressure over the period.
>
> Is there things in the CephFS case that can cause the page-cache to be
> invailated?
> Could less agressive "read-ahead" play a role?
>
> Other thoughts on what root cause on the different behaviour could be?
>
> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
> that could impact ?
>

how many cephfs mounts that access the file? Is is possible that some
program opens that file in RW mode (even they just read the file)?

Yan, Zheng


> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread jesper
> Actual amount of memory used by VFS cache is available through 'grep
> Cached /proc/meminfo'. slabtop provides information about cache
> of inodes, dentries, and IO memory buffers (buffer_head).

Thanks, that was also what I got out of it. And why I reported "free"
output in the first as it also shows available and "cached" memory.

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread Sergey Malinin
Actual amount of memory used by VFS cache is available through 'grep Cached 
/proc/meminfo'. slabtop provides information about cache of inodes, dentries, 
and IO memory buffers (buffer_head).


> On 14.10.2018, at 17:28, jes...@krogh.cc wrote:
> 
>> Try looking in /proc/slabinfo / slabtop during your tests.
> 
> I need a bit of guidance here..  Does the slabinfo cover the VFS page
> cache ? .. I cannot seem to find any traces (sorting by size on
> machines with a huge cache does not really give anything). Perhaps
> I'm holding the screwdriver wrong?
> 
> -- 
> Jesper
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread jesper
> Try looking in /proc/slabinfo / slabtop during your tests.

I need a bit of guidance here..  Does the slabinfo cover the VFS page
cache ? .. I cannot seem to find any traces (sorting by size on
machines with a huge cache does not really give anything). Perhaps
I'm holding the screwdriver wrong?

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread Sergey Malinin
Try looking in /proc/slabinfo / slabtop during your tests.


> On 14.10.2018, at 15:21, jes...@krogh.cc wrote:
> 
> Hi
> 
> We have a dataset of ~300 GB on CephFS which as being used for computations
> over and over agian .. being refreshed daily or similar.
> 
> When hosting it on NFS after refresh, they are transferred, but from
> there - they would be sitting in the kernel page cache of the client
> until they are refreshed serverside.
> 
> On CephFS it look "similar" but "different". Where the "steady state"
> operation over NFS would give a client/server traffic of < 1MB/s ..
> CephFS contantly pulls 50-100MB/s over the network.  This has
> implications for the clients that end up spending unnessary time waiting
> for IO in the execution.
> 
> This is in a setting where the CephFS client mem look like this:
> 
> $ free -h
>  totalusedfree  shared  buff/cache  
> available
> Mem:   377G 17G340G1.2G 19G   
> 354G
> Swap:  8.8G430M8.4G
> 
> 
> If I just repeatedly run (within a few minute) something that is using the
> files, then
> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
> like
> it is being evicted way faster than in the NFS setting?
> 
> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
> type on a total of 24GB data in 300'ish files.
> 
> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
> time CMD ;
> 
>  totalusedfree  shared  buff/cache  
> available
> Mem:   377G 16G312G1.2G 48G   
> 355G
> Swap:  8.8G430M8.4G
> 
> real0m8.997s
> user0m2.036s
> sys 0m6.915s
>  totalusedfree  shared  buff/cache  
> available
> Mem:   377G 17G277G1.2G 82G   
> 354G
> Swap:  8.8G430M8.4G
> 
> real3m25.904s
> user0m2.794s
> sys 0m9.028s
>  totalusedfree  shared  buff/cache  
> available
> Mem:   377G 17G283G1.2G 76G   
> 353G
> Swap:  8.8G430M8.4G
> 
> real6m18.358s
> user0m2.847s
> sys 0m10.651s
> 
> 
> Munin graphs of the system confirms that there has been zero memory
> pressure over the period.
> 
> Is there things in the CephFS case that can cause the page-cache to be
> invailated?
> Could less agressive "read-ahead" play a role?
> 
> Other thoughts on what root cause on the different behaviour could be?
> 
> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
> that could impact ?
> 
> Jesper
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread Jesper Krogh
On 14 Oct 2018, at 15.26, John Hearns  wrote:
> 
> This is a general question for the ceph list.
> Should Jesper be looking at these vm tunables?
> vm.dirty_ratio
> vm.dirty_centisecs
> 
> What effect do they have when using Cephfs?

This situation is a read only, thus no dirty data in page cache. Above should 
be irrelevant. 

Jesper


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread John Hearns
This is a general question for the ceph list.
Should Jesper be looking at these vm tunables?
vm.dirty_ratio
vm.dirty_centisecs

What effect do they have when using Cephfs?

On Sun, 14 Oct 2018 at 14:24, John Hearns  wrote:

> Hej Jesper.
> Sorry I do not have a direct answer to your question.
> When looking at memory usage, I often use this command:
>
> watch cat /rpoc/meminfo
>
>
>
>
>
>
> On Sun, 14 Oct 2018 at 13:22,  wrote:
>
>> Hi
>>
>> We have a dataset of ~300 GB on CephFS which as being used for
>> computations
>> over and over agian .. being refreshed daily or similar.
>>
>> When hosting it on NFS after refresh, they are transferred, but from
>> there - they would be sitting in the kernel page cache of the client
>> until they are refreshed serverside.
>>
>> On CephFS it look "similar" but "different". Where the "steady state"
>> operation over NFS would give a client/server traffic of < 1MB/s ..
>> CephFS contantly pulls 50-100MB/s over the network.  This has
>> implications for the clients that end up spending unnessary time waiting
>> for IO in the execution.
>>
>> This is in a setting where the CephFS client mem look like this:
>>
>> $ free -h
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G340G1.2G 19G
>> 354G
>> Swap:  8.8G430M8.4G
>>
>>
>> If I just repeatedly run (within a few minute) something that is using the
>> files, then
>> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
>> like
>> it is being evicted way faster than in the NFS setting?
>>
>> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
>> type on a total of 24GB data in 300'ish files.
>>
>> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
>> time CMD ;
>>
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 16G312G1.2G 48G
>> 355G
>> Swap:  8.8G430M8.4G
>>
>> real0m8.997s
>> user0m2.036s
>> sys 0m6.915s
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G277G1.2G 82G
>> 354G
>> Swap:  8.8G430M8.4G
>>
>> real3m25.904s
>> user0m2.794s
>> sys 0m9.028s
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:   377G 17G283G1.2G 76G
>> 353G
>> Swap:  8.8G430M8.4G
>>
>> real6m18.358s
>> user0m2.847s
>> sys 0m10.651s
>>
>>
>> Munin graphs of the system confirms that there has been zero memory
>> pressure over the period.
>>
>> Is there things in the CephFS case that can cause the page-cache to be
>> invailated?
>> Could less agressive "read-ahead" play a role?
>>
>> Other thoughts on what root cause on the different behaviour could be?
>>
>> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
>> that could impact ?
>>
>> Jesper
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-14 Thread John Hearns
Hej Jesper.
Sorry I do not have a direct answer to your question.
When looking at memory usage, I often use this command:

watch cat /rpoc/meminfo






On Sun, 14 Oct 2018 at 13:22,  wrote:

> Hi
>
> We have a dataset of ~300 GB on CephFS which as being used for computations
> over and over agian .. being refreshed daily or similar.
>
> When hosting it on NFS after refresh, they are transferred, but from
> there - they would be sitting in the kernel page cache of the client
> until they are refreshed serverside.
>
> On CephFS it look "similar" but "different". Where the "steady state"
> operation over NFS would give a client/server traffic of < 1MB/s ..
> CephFS contantly pulls 50-100MB/s over the network.  This has
> implications for the clients that end up spending unnessary time waiting
> for IO in the execution.
>
> This is in a setting where the CephFS client mem look like this:
>
> $ free -h
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G340G1.2G 19G
> 354G
> Swap:  8.8G430M8.4G
>
>
> If I just repeatedly run (within a few minute) something that is using the
> files, then
> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
> like
> it is being evicted way faster than in the NFS setting?
>
> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
> type on a total of 24GB data in 300'ish files.
>
> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
> time CMD ;
>
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 16G312G1.2G 48G
> 355G
> Swap:  8.8G430M8.4G
>
> real0m8.997s
> user0m2.036s
> sys 0m6.915s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G277G1.2G 82G
> 354G
> Swap:  8.8G430M8.4G
>
> real3m25.904s
> user0m2.794s
> sys 0m9.028s
>   totalusedfree  shared  buff/cache
> available
> Mem:   377G 17G283G1.2G 76G
> 353G
> Swap:  8.8G430M8.4G
>
> real6m18.358s
> user0m2.847s
> sys 0m10.651s
>
>
> Munin graphs of the system confirms that there has been zero memory
> pressure over the period.
>
> Is there things in the CephFS case that can cause the page-cache to be
> invailated?
> Could less agressive "read-ahead" play a role?
>
> Other thoughts on what root cause on the different behaviour could be?
>
> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
> that could impact ?
>
> Jesper
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com