Re: [ceph-users] cephfs-data-scan

2018-11-03 Thread Rhian Resnick
Sounds like we are going to restart with 20 threads on each storage node.

On Sat, Nov 3, 2018 at 8:26 PM Sergey Malinin  wrote:

> scan_extents using 8 threads took 82 hours for my cluster holding 120M
> files on 12 OSDs with 1gbps between nodes. I would have gone with lot more
> threads if I had known it only operated on data pool and the only problem
> was network latency. If I recall correctly, each worker used up to 800mb
> ram so beware the OOM killer.
> scan_inodes runs several times faster but I don’t remember exact timing.
> In your case I believe scan_extents & scan_inodes can be done in a few
> hours by running the tool on each OSD node, but scan_links will be
> painfully slow due to it’s single-threaded nature.
> In my case I ended up getting MDS to start and copied all data to a fresh
> filesystem ignoring errors.
> On Nov 4, 2018, 02:22 +0300, Rhian Resnick , wrote:
>
> For a 150TB file system with 40 Million files how many cephfs-data-scan
> threads should be used? Or what is the expected run time. (we have 160 osd
> with 4TB disks.)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs-data-scan

2018-11-03 Thread Sergey Malinin
scan_extents using 8 threads took 82 hours for my cluster holding 120M files on 
12 OSDs with 1gbps between nodes. I would have gone with lot more threads if I 
had known it only operated on data pool and the only problem was network 
latency. If I recall correctly, each worker used up to 800mb ram so beware the 
OOM killer.
scan_inodes runs several times faster but I don’t remember exact timing.
In your case I believe scan_extents & scan_inodes can be done in a few hours by 
running the tool on each OSD node, but scan_links will be painfully slow due to 
it’s single-threaded nature.
In my case I ended up getting MDS to start and copied all data to a fresh 
filesystem ignoring errors.
On Nov 4, 2018, 02:22 +0300, Rhian Resnick , wrote:
> For a 150TB file system with 40 Million files how many cephfs-data-scan 
> threads should be used? Or what is the expected run time. (we have 160 osd 
> with 4TB disks.)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs-data-scan

2018-11-03 Thread Rhian Resnick
For a 150TB file system with 40 Million files how many cephfs-data-scan
threads should be used? Or what is the expected run time. (we have 160 osd
with 4TB disks.)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Should OSD write error result in damaged filesystem?

2018-11-03 Thread Bryan Henderson
I had a filesystem rank get damaged when the MDS had an error writing the log
to the OSD.  Is damage expected when a log write fails?

According to log messages, an OSD write failed because the MDS attempted
to write a bigger chunk than the OSD's maximum write size.  I can probably
figure out why that happened and fix it, but OSD write failures can happen for
lots of reasons, and I would have expected the MDS just to discard the recent
filesystem updates, issue a log message, and keep going.  The user had
presumably not been told those updates were committed.


And how do I repair this now?  Is this a job for

  cephfs-journal-tool event recover_dentries
  cephfs-journal-tool journal reset

?

This is Jewel.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

2018-11-03 Thread Pavan Rallabhandi
Not exactly, this feature was supported in Jewel starting 10.2.11, ref 
https://github.com/ceph/ceph/pull/18010

I thought you mentioned you were using Luminous 12.2.4.

From: David Turner 
Date: Friday, November 2, 2018 at 5:21 PM
To: Pavan Rallabhandi 
Cc: ceph-users 
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

That makes so much more sense. It seems like RHCS had had this ability since 
Jewel while it was only put into the community version as of Mimic. So my 
version of the version isn't actually capable of changing the backend db. Whole 
digging into the coffee I did find a bug with the creation of the rocksdb 
backend created with ceph-kvstore-tool. It doesn't use the ceph defaults or any 
settings in your config file for the db settings. I'm working on testing a 
modified version that should take those settings into account. If the fix does 
work, the fix will be able to apply to a few other tools as well that can be 
used to set up the omap backend db.

On Fri, Nov 2, 2018, 4:26 PM Pavan Rallabhandi 
mailto:prallabha...@walmartlabs.com>> wrote:
It was Redhat versioned Jewel. But may be more relevantly, we are on Ubuntu 
unlike your case.

From: David Turner mailto:drakonst...@gmail.com>>
Date: Friday, November 2, 2018 at 10:24 AM

To: Pavan Rallabhandi 
mailto:prallabha...@walmartlabs.com>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

Pavan, which version of Ceph were you using when you changed your backend to 
rocksdb?

On Mon, Oct 1, 2018 at 4:24 PM Pavan Rallabhandi 
mailto:prallabha...@walmartlabs.com>> wrote:
Yeah, I think this is something to do with the CentOS binaries, sorry that I 
couldn’t be of much help here.

Thanks,
-Pavan.

From: David Turner mailto:drakonst...@gmail.com>>
Date: Monday, October 1, 2018 at 1:37 PM
To: Pavan Rallabhandi 
mailto:prallabha...@walmartlabs.com>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: EXT: Re: [ceph-users] Any backfill in our cluster makes the cluster 
unusable and takes forever

I tried modifying filestore_rocksdb_options by removing 
compression=kNoCompression as well as setting it to 
compression=kSnappyCompression.  Leaving it with kNoCompression or removing it 
results in the same segfault in the previous log.  Setting it to 
kSnappyCompression resulted in [1] this being logged and the OSD just failing 
to start instead of segfaulting.  Is there anything else you would suggest 
trying before I purge this OSD from the cluster?  I'm afraid it might be 
something with the CentOS binaries.

[1] 2018-10-01 17:10:37.134930 7f1415dfcd80  0  set rocksdb option compression 
= kSnappyCompression
2018-10-01 17:10:37.134986 7f1415dfcd80 -1 rocksdb: Invalid argument: 
Compression type Snappy is not linked with the binary.
2018-10-01 17:10:37.135004 7f1415dfcd80 -1 filestore(/var/lib/ceph/osd/ceph-1) 
mount(1723): Error initializing rocksdb :
2018-10-01 17:10:37.135020 7f1415dfcd80 -1 osd.1 0 OSD:init: unable to mount 
object store
2018-10-01 17:10:37.135029 7f1415dfcd80 -1 ESC[0;31m ** ERROR: osd init failed: 
(1) Operation not permittedESC[0m

On Sat, Sep 29, 2018 at 1:57 PM Pavan Rallabhandi 
> 
wrote:
I looked at one of my test clusters running Jewel on Ubuntu 16.04, and 
interestingly I found this(below) in one of the OSD logs, which is different 
from your OSD boot log, where none of the compression algorithms seem to be 
supported. This hints more at how rocksdb was built on CentOS for Ceph.

2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Compression algorithms 
supported:
2018-09-29 17:38:38.629112 7fbd318d4b00  4 rocksdb: Snappy supported: 1
2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Zlib supported: 1
2018-09-29 17:38:38.629113 7fbd318d4b00  4 rocksdb: Bzip supported: 0
2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: LZ4 supported: 0
2018-09-29 17:38:38.629114 7fbd318d4b00  4 rocksdb: ZSTD supported: 0
2018-09-29 17:38:38.629115 7fbd318d4b00  4 rocksdb: Fast CRC32 supported: 0

On 9/27/18, 2:56 PM, "Pavan Rallabhandi" 
> 
wrote:

I see Filestore symbols on the stack, so the bluestore config doesn’t 
affect. And the top frame of the stack hints at a RocksDB issue, and there are 
a whole lot of these too:

“2018-09-17 19:23:06.480258 7f1f3d2a7700  2 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/table/block_based_table_reader.cc:636]
 Cannot find Properties block from file.”

It really seems to be something with RocksDB on centOS. I still think you 
can try removing “compression=kNoCompression” from the 

[ceph-users] Snapshot cephfs data pool from ceph cmd

2018-11-03 Thread Rhian Resnick
is it possible to snapshot the cephfs data pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage

2018-11-03 Thread Rhian Resnick
Having attempted to recover using the journal tool and having that fail we
are goinig to rebuild our metadata using a separate metadata pool.

We have the following procedure we are going to use. The issue I haven't
found yet (likely lack of sleep) is how to replace the original metadata
pool in the cephfs so we can continue to use the default name. Then how we
remove the secondary file system.

# ceph fs

ceph fs flag set enable_multiple true --yes-i-really-mean-it
ceph osd pool create recovery 512 replicated replicated_ruleset
ceph fs new recovery-fs recovery cephfs-cold
--allow-dangerous-metadata-overlay
cephfs-data-scan init --force-init --filesystem recovery-fs
--alternate-pool recovery
ceph fs reset recovery-fs --yes-i-really-mean-it


# create structure
cephfs-table-tool recovery-fs:all reset session
cephfs-table-tool recovery-fs:all reset snap
cephfs-table-tool recovery-fs:all reset inode

# build new metadata

# scan_extents

cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3
--worker_m 4 --filesystem cephfs cephfs-cold

# scan inodes
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold

cephfs-data-scan scan_links --filesystem recovery-fs

# need help

Thanks

Rhian

On Fri, Nov 2, 2018 at 9:47 PM Rhian Resnick  wrote:

> I was posting with my office account but I think it is being blocked.
>
> Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and
> after using all storage on the OSD's reports two damaged ranks.
>
> The cephfs-journal-tool crashes when performing any operations due to
> memory utilization.
>
> We tried a backup which crashed (we then did a rados cppool to backup our
> metadata).
> I then tried to run a dentry recovery which failed due to memory usage.
>
> Any recommendations for the next step?
>
> Data from our config and status
>
>
>
>
> Combined logs (after marking things as repaired to see if that would rescue 
> us):
>
>
> Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
> -1 mds.4.purge_queue operator(): Error -108 loading Journaler
> Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
> -1 mds.4.purge_queue operator(): Error -108 loading Journaler
> Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
> (MDS_DAMAGE)
> Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
> (MDS_DAMAGE)
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 
> 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from 
> _is_readable
> Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 
> 1 : Error loading MDS rank 1: (22) Invalid argument
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 
> 7f6dacd69700  0 mds.1.log _replay journaler got error -22, aborting
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 
> 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from 
> _is_readable
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 
> 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: 
> (22) Invalid argument
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 
> 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: 
> (22) Invalid argument
> Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons 
> damaged (MDS_DAMAGE)
> Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons 
> damaged (MDS_DAMAGE)
> Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
> (MDS_DAMAGE)
> Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 
> -1 

Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage

2018-11-03 Thread Rhian Resnick
Morning,


Having attempted to recover using the journal tool and having that fail we are 
goinig to rebuild our metadata using a separate metadata pool.


We have the following procedure we are going to use. The issue I haven't found 
yet (likely lack of sleep) is how to replace the original metadata pool in the 
cephfs so we can continue to use the default name. Then how we remove the 
secondary file system.


# ceph fs

ceph fs flag set enable_multiple true --yes-i-really-mean-it
ceph osd pool create recovery 512 replicated replicated_ruleset
ceph fs new recovery-fs recovery cephfs-cold --allow-dangerous-metadata-overlay
cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool 
recovery
ceph fs reset recovery-fs --yes-i-really-mean-it


# create structure
cephfs-table-tool recovery-fs:all reset session
cephfs-table-tool recovery-fs:all reset snap
cephfs-table-tool recovery-fs:all reset inode

# build new metadata

# scan_extents
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3 --worker_m 
4 --filesystem cephfs cephfs-cold

# scan inodes
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold

cephfs-data-scan scan_links --filesystem recovery-fs

# need help

how to move the new metadata pool to the original filesystem?

how to remove the new cephfs so the original mounts work.


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] 



From: ceph-users  on behalf of Rhian Resnick 

Sent: Friday, November 2, 2018 9:47 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] cephfs-journal-tool event recover_dentries summary killed 
due to memory usage

I was posting with my office account but I think it is being blocked.

Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and after 
using all storage on the OSD's reports two damaged ranks.

The cephfs-journal-tool crashes when performing any operations due to memory 
utilization.

We tried a backup which crashed (we then did a rados cppool to backup our 
metadata).
I then tried to run a dentry recovery which failed due to memory usage.

Any recommendations for the next step?

Data from our config and status




Combined logs (after marking things as repaired to see if that would rescue us):


Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1 
10.141.255.202:6898/1492854021 1 : Error 
loading MDS rank 1: (22) Invalid argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 
 0 mds.1.log _replay journaler got error -22, aborting
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 

Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-11-03 Thread Burkhard Linke

Hi,

On 03.11.18 10:31, jes...@krogh.cc wrote:

I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.

In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.


CephFS is a distributed system, so there's a bookkeeping about every 
file in use by any CephFS client. These entities are 'capabilities'; 
they also implement stuff like distributed locking.



The MDS has to cache every capability it has assigned to a CephFS 
client, in addition to the cache for inode information and other stuff. 
The cache size is limited to control the memory consumption of the MDS 
process. If a MDS is running out of cache, it tries to revoke 
capabilities assigned to CephFS clients to free some memory for new 
capabilities. This revoke process runs asynchronous from MDS to CephFS 
client, similar to NFS delegation.



If a CephFS client receive a cap release request and it is able to 
perform it (no processes accessing the file at the moment), the client 
cleaned up its internal state and allows the MDS to release the cap. 
This cleanup also involves removing file data from the page cache.



If your MDS was running with a too small cache size, it had to revoke 
caps over and over to adhere to its cache size, and the clients had to 
cleanup their cache over and over, too.



You did not mention any details about the MDS settings, especially the 
cache size. I assume you increased the cache size after adding more 
memory, since the problem seems to be solved now.



It actually is not solved, but only mitigated. If your working set size 
increases or the number of clients increases, the MDS has to manage more 
caps and will have to revoke caps more often. You will probably reach an 
equilibrium at some point. The MDS is the most memory hungry part of 
Ceph, and it often caught people by surprise. We had the same problem in 
our setup; even worse the nightly backup is also trashing the MDS cache.



The best way to monitor the MDS is using the 'ceph daemonperf mds.XYZ' 
command on the MDS host. It gives you the current performance counters 
including the inode and caps count. Our MDS is configured with a 40 GB 
cache size and currently has 15 million inodes cached and is managing 
3.1 million capabilities.



TL;DR: MDS needs huge amounts of memory for its internal bookkeeping.


Hope this helps.


Regards,

Burkhard





If you can reproduce this issue. please send kernel log to us.

Will do if/when it reappears.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS kernel client versions - pg-upmap

2018-11-03 Thread jesper
Hi.

I tried to enable the "new smart balancing" - backend are on RH luminous
clients are Ubuntu 4.15 kernel.

As per: http://docs.ceph.com/docs/mimic/rados/operations/upmap/
$ sudo ceph osd set-require-min-compat-client luminous
Error EPERM: cannot set require_min_compat_client to luminous: 1 connected
client(s) look like firefly (missing 0xe010020); 1 connected
client(s) look like firefly (missing 0xe01); 1 connected
client(s) look like hammer (missing 0xe20); 55 connected
client(s) look like jewel (missing 0x800); add
--yes-i-really-mean-it to do it anyway

ok, so 4.15 kernel connects as a "hammer" (<1.0) client?  Is there a
huge gap in upstreaming kernel clients to kernel.org or what am I
misreading here?

Hammer is 2015'ish - 4.15 is January 2018'ish?

Is kernel client development lacking behind ?

Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-11-03 Thread jesper
> I suspect that mds asked client to trim its cache. Please run
> following commands on an idle client.

In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.

> If you can reproduce this issue. please send kernel log to us.

Will do if/when it reappears.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC K + M Size

2018-11-03 Thread Janne Johansson
Den lör 3 nov. 2018 kl 09:10 skrev Ashley Merrick :
>
> Hello,
>
> Tried to do some reading online but was unable to find much.
>
> I can imagine a higher K + M size with EC requires more CPU to re-compile the 
> shards into the required object.
>
> But is there any benefit or negative going with a larger K + M, obviously 
> their is the size benefit but technically could it also improve reads due to 
> more OSD's providing a smaller section of the data required to compile the 
> shard?
>
> Is there any gotchas that should be known for example going with a 4+2 vs 10+2
>

If one host goes down in a 10+2 scenarion, then 11 or 12 other
machines need to get involved in order to repair the lost data. This
means that if your cluster has close to 12 hosts, it would mean most
or all the servers get extra work now. I saw some old yahoo post from
long ago that stated that the primary (whose job it is to piece them
together) would only send out 8 requests at any given time, and IF
still true, that would make 6+2 somewhat more efficient. Still, EC is
seldom about performance, but rather to save space while still
allowing 1-2-3 drives to die without losing data by using 1-2-3
checksum pieces.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] EC K + M Size

2018-11-03 Thread Ashley Merrick
Hello,

Tried to do some reading online but was unable to find much.

I can imagine a higher K + M size with EC requires more CPU to re-compile
the shards into the required object.

But is there any benefit or negative going with a larger K + M, obviously
their is the size benefit but technically could it also improve reads due
to more OSD's providing a smaller section of the data required to compile
the shard?

Is there any gotchas that should be known for example going with a 4+2 vs
10+2

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com