[ceph-users] Re: CephFS ghost usage/inodes

2020-01-23 Thread Oskar Malnowicz
Any other ideas ?

> Am 15.01.2020 um 15:50 schrieb Oskar Malnowicz 
> :
> 
> the situation is:
> 
> health: HEALTH_WARN
>   1 pools have many more objects per pg than average
> 
> $ ceph health detail
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
> pool cephfs_data objects per pg (315399) is more than 1227.23 times 
> cluster average (257)
> 
> $ ceph df
> RAW STORAGE:
> CLASS SIZEAVAIL   USEDRAW USED %RAW USED
> hdd   7.8 TiB 7.4 TiB 326 GiB  343 GiB  4.30
> TOTAL 7.8 TiB 7.4 TiB 326 GiB  343 GiB  4.30
> 
> POOLS:
> POOL   ID STORED  OBJECTS USED%USED 
> MAX AVAIL
> cephfs_data 6 2.2 TiB   2.52M 2.2 TiB 26.44   
> 3.0 TiB
> cephfs_metadata 7 9.7 MiB 379 9.7 MiB 0   
> 3.0 TiB
>   
> the stored value of the "cephfs_data" pool is 2.2TiB. This must be wrong. 
> When i execute "du -sh" from the MDS root "/" i get an usage:
> 
> $ du -sh
> 31G .
> 
> "df -h" shows:
> 
> $ df -h
> Filesystem   Size  Used Avail Use% Mounted on
> ip1,ip2,ip3:/5.2T  2.2T  3.0T  43% /storage/cephfs
> 
> It says that "Used" ist 2.2T but "du" shows 31G
> 
> the pg_num from the "cephfs_data" pool is now 8. Autoscale suggest me to set 
> this parameter to 512
> 
> $ ceph osd pool autoscale-status
> POOLSIZE TARGET SIZE RATE RAW CAPACITY  RATIO TARGET 
> RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
> cephfs_metadata9994k  2.07959G 0. 
>   1.0  8off
> cephfs_data2221G  2.07959G 0.5582 
>   1.0  8512 off
> 
> after setting pg_num to 512 the situation is:
> 
> $ ceph health detail
> HEALTH_WARN 1 pools have many more objects per pg than average
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
> pool cephfs_data objects per pg (4928) is more than 100.571 times cluster 
> average (49)
> 
> $ ceph df
> RAW STORAGE:
> CLASS SIZEAVAIL   USEDRAW USED %RAW USED
> hdd   7.8 TiB 7.4 TiB 329 GiB  346 GiB  4.34
> TOTAL 7.8 TiB 7.4 TiB 329 GiB  346 GiB  4.34
> 
> POOLS:
> POOL  ID STORED  OBJECTS USED
> %USED MAX AVAIL
> cephfs_data6  30 GiB   2.52M  61 GiB  
> 0.99   3.0 TiB
> cephfs_metadata7 9.8 MiB 379  20 MiB  
>0   3.0 TiB
> 
> The "stored" value changed from 2.2TiB to 30GiB !!! This should be the 
> correct usage/size.
> 
> When i execute "du -sh" from the MDS root "/" i get again an usage:
> 
> $ du -sh
> 31G
> 
> and "df -h" shows again
> 
> $ df -h
> Filesystem   Size  Used Avail Use% Mounted on
> ip1,ip2,ip3:/5.2T  2.2T  3.0T  43% /storage/cephfs
> 
> It says that "Used" ist 2.2T but "du" shows 31G
> 
> Can anybody explain me whats the problem ?
> 
> 
> 
> 
> Am 14.01.20 um 11:15 schrieb Florian Pritz:
>> Hi,
>> 
>> When we tried putting some load on our test cephfs setup by restoring a
>> backup in artifactory, we eventually ran out of space (around 95% used
>> in `df` = 3.5TB) which caused artifactory to abort the restore and clean
>> up. However, while a simple `find` no longer shows the files, `df` still
>> claims that we have around 2.1TB of data on the cephfs. `df -i` also
>> shows 2.4M used inodes. When using `du -sh` on a top-level mountpoint, I
>> get 31G used, which is data that is still really here and which is
>> expected to be here.
>> 
>> Consequently, we also get the following warning:
>> 
>> 
>>> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>>> pool cephfs_data objects per pg (38711) is more than 231.802 times 
>>> cluster average (167)
>>> 
>> We are running ceph 14.2.5.
>> 
>> We have snapshots enabled on cephfs, but there are currently no active
>> snapshots listed by `ceph daemon mds.$hostname dump snaps --server` (see
>> below). I can't say for sure if we created snapshots during the backup
>> restore.
>> 
>> 
>>> {
>>> "last_snap": 39,
>>> "last_created": 38,
>>> "last_destroyed": 39,
>>> "pending_noop": [],
>>> "snaps": [],
>>> "need_to_purge": {},
>>> "pending_update": [],
>>> "pending_destroy": []
>>> }
>>> 
>> We only have a single CephFS.
>> 
>> We use the pool_namespace xattr for our various directory trees on the
>> cephfs.
>> 
>> `ceph df` shows:
>> 
>> 
>>> POOL ID STORED   OBJECTS   USED%USED MAX AVAIL
>>> cephfs_data  6  2.1 TiB  2.48M 2.1 TiB 24.97   3.1 TiB
>>> 
>> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>> 
>> 
>>> "num_strays": 0,
>>> "num_strays_delayed": 0,
>>> "num_strays_enqueuing": 0,
>>> "strays_created": 5097138,
>>> "strays_enqueued": 5097138,
>>> 

[ceph-users] Re: CephFS ghost usage/inodes

2020-01-15 Thread Oskar Malnowicz
i think there is something wrong with the cephfs_data pool.
i created a new pool "cephfs_data2" and copied data from the
"cephfs_data" to the "cephfs_data2" pool by using this command:

$ rados cppool cephfs_data cephfs_data2

$ ceph df detail
RAW STORAGE:
    CLASS SIZE    AVAIL   USED    RAW USED %RAW USED
    hdd   7.8 TiB 7.4 TiB 390 GiB  407 GiB  5.11
    TOTAL 7.8 TiB 7.4 TiB 390 GiB  407 GiB  5.11

POOLS:
    POOL  ID STORED  OBJECTS
USED    %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES
DIRTY  USED COMPR UNDER COMPR
    cephfs_data    6  30 GiB   2.52M  61
GiB  1.02   2.9 TiB N/A   N/A 
2.52M    0 B 0 B
    cephfs_data2  20  30 GiB  11.06k  61
GiB  1.02   2.9 TiB N/A   N/A
11.06k    0 B 0 B
    cephfs_metadata    7 9.8 MiB 379  20
MiB 0   2.9 TiB N/A   N/A   
379    0 B 0 B

in the new pool the stored amount is also 30GiB, but object count and
dirty count are significant smaller.

i think that in the "cephfs_data" pool are something like "orpahned"
objects. but how can i cleanup that pool ?

Am 14.01.20 um 11:15 schrieb Florian Pritz:
> Hi,
>
> When we tried putting some load on our test cephfs setup by restoring a
> backup in artifactory, we eventually ran out of space (around 95% used
> in `df` = 3.5TB) which caused artifactory to abort the restore and clean
> up. However, while a simple `find` no longer shows the files, `df` still
> claims that we have around 2.1TB of data on the cephfs. `df -i` also
> shows 2.4M used inodes. When using `du -sh` on a top-level mountpoint, I
> get 31G used, which is data that is still really here and which is
> expected to be here.
>
> Consequently, we also get the following warning:
>
>> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>> pool cephfs_data objects per pg (38711) is more than 231.802 times 
>> cluster average (167)
> We are running ceph 14.2.5.
>
> We have snapshots enabled on cephfs, but there are currently no active
> snapshots listed by `ceph daemon mds.$hostname dump snaps --server` (see
> below). I can't say for sure if we created snapshots during the backup
> restore.
>
>> {
>> "last_snap": 39,
>> "last_created": 38,
>> "last_destroyed": 39,
>> "pending_noop": [],
>> "snaps": [],
>> "need_to_purge": {},
>> "pending_update": [],
>> "pending_destroy": []
>> }
> We only have a single CephFS.
>
> We use the pool_namespace xattr for our various directory trees on the
> cephfs.
>
> `ceph df` shows:
>
>> POOL ID STORED   OBJECTS   USED%USED MAX AVAIL
>> cephfs_data  6  2.1 TiB  2.48M 2.1 TiB 24.97   3.1 TiB
> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>
>> "num_strays": 0,
>> "num_strays_delayed": 0,
>> "num_strays_enqueuing": 0,
>> "strays_created": 5097138,
>> "strays_enqueued": 5097138,
>> "strays_reintegrated": 0,
>> "strays_migrated": 0,
> `rados -p cephfs_data df` shows:
>
>> POOL_NAME  USED OBJECTS CLONES  COPIES MISSING_ON_PRIMARY UNFOUND 
>> DEGRADED   RD_OPS  RD   WR_OPS WR USED COMPR UNDER COMPR
>> cephfs_data 2.1 TiB 2477540  0 4955080  0   0
>> 0 10699626 6.9 TiB 86911076 35 TiB0 B 0 B
>>
>> total_objects29718
>> total_used   329 GiB
>> total_avail  7.5 TiB
>> total_space  7.8 TiB
> When I combine the usage and the free space shown by `df` we would
> exceed our cluster size. Our test cluster currently has 7.8TB total
> space with a replication size of 2 for all pools. With 2.1TB
> "used" on the cephfs according to `df` + 3.1TB being shows as "free" I
> get 5.2TB total size. This would mean >10TB of data when accounted for
> replication. Clearly this can't fit on a cluster with only 7.8TB of
> capacity.
>
> Do you have any ideas why we see so many objects and so much reported
> usage? Is there any way to fix this without recreating the cephfs?
>
> Florian
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-15 Thread Oskar Malnowicz
the situation is:

health: HEALTH_WARN
  1 pools have many more objects per pg than average

$ ceph health detail
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
    pool cephfs_data objects per pg (315399) is more than 1227.23 times
cluster average (257)

$ ceph df
RAW STORAGE:
    CLASS SIZE    AVAIL   USED    RAW USED %RAW USED
    hdd   7.8 TiB 7.4 TiB 326 GiB  343 GiB  4.30
    TOTAL 7.8 TiB 7.4 TiB 326 GiB  343 GiB  4.30

POOLS:
    POOL   ID STORED  OBJECTS USED   
%USED MAX AVAIL
    cephfs_data 6 2.2 TiB   2.52M 2.2 TiB
26.44   3.0 TiB
    cephfs_metadata 7 9.7 MiB 379 9.7 MiB
0   3.0 TiB
 
the stored value of the "cephfs_data" pool is 2.2TiB. This must be
wrong. When i execute "du -sh" from the MDS root "/" i get an usage:

$ du -sh
31G .

"df -h" shows:

$ df -h
Filesystem   Size  Used Avail Use% Mounted on
ip1,ip2,ip3:/    5.2T  2.2T  3.0T  43% /storage/cephfs

It says that "Used" ist 2.2T but "du" shows 31G

the pg_num from the "cephfs_data" pool is now 8. Autoscale suggest me to
set this parameter to 512

$ ceph osd pool autoscale-status
POOL    SIZE TARGET SIZE RATE RAW CAPACITY  RATIO
TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_metadata    9994k  2.0    7959G
0.   1.0  8    off
cephfs_data    2221G  2.0    7959G
0.5582   1.0  8    512 off

after setting pg_num to 512 the situation is:

$ ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
    pool cephfs_data objects per pg (4928) is more than 100.571 times
cluster average (49)

$ ceph df
RAW STORAGE:
    CLASS SIZE    AVAIL   USED    RAW USED %RAW USED
    hdd   7.8 TiB 7.4 TiB 329 GiB  346 GiB  4.34
    TOTAL 7.8 TiB 7.4 TiB 329 GiB  346 GiB  4.34

POOLS:
    POOL  ID STORED  OBJECTS
USED    %USED MAX AVAIL
    cephfs_data    6  30 GiB   2.52M  61
GiB  0.99   3.0 TiB
    cephfs_metadata    7 9.8 MiB 379  20
MiB 0   3.0 TiB

The "stored" value changed from 2.2TiB to 30GiB !!! This should be the
correct usage/size.

When i execute "du -sh" from the MDS root "/" i get again an usage:

$ du -sh
31G

and "df -h" shows again

$ df -h
Filesystem   Size  Used Avail Use% Mounted on
ip1,ip2,ip3:/    5.2T  2.2T  3.0T  43% /storage/cephfs

It says that "Used" ist 2.2T but "du" shows 31G

Can anybody explain me whats the problem ?




Am 14.01.20 um 11:15 schrieb Florian Pritz:
> Hi,
>
> When we tried putting some load on our test cephfs setup by restoring a
> backup in artifactory, we eventually ran out of space (around 95% used
> in `df` = 3.5TB) which caused artifactory to abort the restore and clean
> up. However, while a simple `find` no longer shows the files, `df` still
> claims that we have around 2.1TB of data on the cephfs. `df -i` also
> shows 2.4M used inodes. When using `du -sh` on a top-level mountpoint, I
> get 31G used, which is data that is still really here and which is
> expected to be here.
>
> Consequently, we also get the following warning:
>
>> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>> pool cephfs_data objects per pg (38711) is more than 231.802 times 
>> cluster average (167)
> We are running ceph 14.2.5.
>
> We have snapshots enabled on cephfs, but there are currently no active
> snapshots listed by `ceph daemon mds.$hostname dump snaps --server` (see
> below). I can't say for sure if we created snapshots during the backup
> restore.
>
>> {
>> "last_snap": 39,
>> "last_created": 38,
>> "last_destroyed": 39,
>> "pending_noop": [],
>> "snaps": [],
>> "need_to_purge": {},
>> "pending_update": [],
>> "pending_destroy": []
>> }
> We only have a single CephFS.
>
> We use the pool_namespace xattr for our various directory trees on the
> cephfs.
>
> `ceph df` shows:
>
>> POOL ID STORED   OBJECTS   USED%USED MAX AVAIL
>> cephfs_data  6  2.1 TiB  2.48M 2.1 TiB 24.97   3.1 TiB
> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>
>> "num_strays": 0,
>> "num_strays_delayed": 0,
>> "num_strays_enqueuing": 0,
>> "strays_created": 5097138,
>> "strays_enqueued": 5097138,
>> "strays_reintegrated": 0,
>> "strays_migrated": 0,
> `rados -p cephfs_data df` shows:
>
>> POOL_NAME  USED OBJECTS CLONES  COPIES MISSING_ON_PRIMARY UNFOUND 
>> DEGRADED   RD_OPS  RD   WR_OPS WR USED COMPR UNDER COMPR
>> cephfs_data 2.1 TiB 2477540  0 4955080  0   0
>> 0 10699626 6.9 TiB 86911076 35 TiB0 B 0 B
>>

[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
i executed the commands from above again ("Recovery from missing
metadata objects") and now the mds daemons start.
still the same situation like before :(

Am 14.01.20 um 22:36 schrieb Oskar Malnowicz:
> i just restartet the mds daemons and now they crash during the boot.
>
>    -36> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> opening inotable
>    -35> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> opening sessionmap
>    -34> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> opening mds log
>    -33> 2020-01-14 22:33:17.880 7fc9bbeaa700  5 mds.0.log open
> discovering log bounds
>    -32> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> opening purge queue (async)
>    -31> 2020-01-14 22:33:17.880 7fc9bbeaa700  4 mds.0.purge_queue open:
> opening
>    -30> 2020-01-14 22:33:17.880 7fc9bbeaa700  1 mds.0.journaler.pq(ro)
> recover start
>    -29> 2020-01-14 22:33:17.880 7fc9bb6a9700  4 mds.0.journalpointer
> Reading journal pointer '400.'
>    -28> 2020-01-14 22:33:17.880 7fc9bbeaa700  1 mds.0.journaler.pq(ro)
> read_head
>    -27> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> loading open file table (async)
>    -26> 2020-01-14 22:33:17.880 7fc9c58a5700 10 monclient:
> get_auth_request con 0x55aa83436d80 auth_method 0
>    -25> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
> opening snap table
>    -24> 2020-01-14 22:33:17.884 7fc9c58a5700 10 monclient:
> get_auth_request con 0x55aa83437680 auth_method 0
>    -23> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient:
> get_auth_request con 0x55aa83437200 auth_method 0
>    -22> 2020-01-14 22:33:17.884 7fc9bceac700  1 mds.0.journaler.pq(ro)
> _finish_read_head loghead(trim 805306368, expire 807199928, write
> 807199928, stream_format 1).  probing for end of log (from 807199928)...
>    -21> 2020-01-14 22:33:17.884 7fc9bceac700  1 mds.0.journaler.pq(ro)
> probing for end of the log
>    -20> 2020-01-14 22:33:17.884 7fc9bb6a9700  1
> mds.0.journaler.mdlog(ro) recover start
>    -19> 2020-01-14 22:33:17.884 7fc9bb6a9700  1
> mds.0.journaler.mdlog(ro) read_head
>    -18> 2020-01-14 22:33:17.884 7fc9bb6a9700  4 mds.0.log Waiting for
> journal 0x200 to recover...
>    -17> 2020-01-14 22:33:17.884 7fc9c68a7700 10 monclient:
> get_auth_request con 0x55aa83437f80 auth_method 0
>    -16> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient:
> get_auth_request con 0x55aa83438400 auth_method 0
>    -15> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
> mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 98280931328,
> expire 98282151365, write 98282247624, stream_format 1).  probing for
> end of log (from 98282247624)...
>    -14> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
> mds.0.journaler.mdlog(ro) probing for end of the log
>    -13> 2020-01-14 22:33:17.892 7fc9bceac700  1 mds.0.journaler.pq(ro)
> _finish_probe_end write_pos = 807199928 (header had 807199928). recovered.
>    -12> 2020-01-14 22:33:17.892 7fc9bceac700  4 mds.0.purge_queue
> operator(): open complete
>    -11> 2020-01-14 22:33:17.892 7fc9bceac700  1 mds.0.journaler.pq(ro)
> set_writeable
>    -10> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
> mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 98283021535
> (header had 98282247624). recovered.
>     -9> 2020-01-14 22:33:17.892 7fc9bb6a9700  4 mds.0.log Journal 0x200
> recovered.
>     -8> 2020-01-14 22:33:17.892 7fc9bb6a9700  4 mds.0.log Recovered
> journal 0x200 in format 1
>     -7> 2020-01-14 22:33:17.892 7fc9bb6a9700  2 mds.0.13470 Booting: 1:
> loading/discovering base inodes
>     -6> 2020-01-14 22:33:17.892 7fc9bb6a9700  0 mds.0.cache creating
> system inode with ino:0x100
>     -5> 2020-01-14 22:33:17.892 7fc9bb6a9700  0 mds.0.cache creating
> system inode with ino:0x1
>     -4> 2020-01-14 22:33:17.896 7fc9bbeaa700  2 mds.0.13470 Booting: 2:
> replaying mds log
>     -3> 2020-01-14 22:33:17.896 7fc9bbeaa700  2 mds.0.13470 Booting: 2:
> waiting for purge queue recovered
>     -2> 2020-01-14 22:33:17.908 7fc9ba6a7700 -1 log_channel(cluster) log
> [ERR] : ESession.replay sessionmap v 7561128 - 1 > table 0
>     -1> 2020-01-14 22:33:17.912 7fc9ba6a7700 -1
> /build/ceph-14.2.5/src/mds/journal.cc: In function 'virtual void
> ESession::replay(MDSRank*)' thread 7fc9ba6a7700 time 2020-01-14
> 22:33:17.912135
> /build/ceph-14.2.5/src/mds/journal.cc: 1655: FAILED
> ceph_assert(g_conf()->mds_wipe_sessions)
>
> Am 14.01.20 um 21:19 schrieb Oskar Malnowicz:
>> this was the new state. the results are equal to florians
>>
>> $ time cephfs-data-scan scan_extents cephfs_data
>> cephfs-data-scan scan_extents cephfs_data  1.86s user 1.47s system 21%
>> cpu 15.397 total
>>
>> $ time cephfs-data-scan scan_inodes cephfs_data
>> cephfs-data-scan scan_inodes cephfs_data  2.76s user 2.05s system 26%
>> cpu 17.912 total
>>
>> $ time cephfs-data-scan scan_links
>> cephfs-data-scan scan_links  0.13s user 0.11s system 31% cpu 0.747 total
>>
>> $ time 

[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
i just restartet the mds daemons and now they crash during the boot.

   -36> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
opening inotable
   -35> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
opening sessionmap
   -34> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
opening mds log
   -33> 2020-01-14 22:33:17.880 7fc9bbeaa700  5 mds.0.log open
discovering log bounds
   -32> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
opening purge queue (async)
   -31> 2020-01-14 22:33:17.880 7fc9bbeaa700  4 mds.0.purge_queue open:
opening
   -30> 2020-01-14 22:33:17.880 7fc9bbeaa700  1 mds.0.journaler.pq(ro)
recover start
   -29> 2020-01-14 22:33:17.880 7fc9bb6a9700  4 mds.0.journalpointer
Reading journal pointer '400.'
   -28> 2020-01-14 22:33:17.880 7fc9bbeaa700  1 mds.0.journaler.pq(ro)
read_head
   -27> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
loading open file table (async)
   -26> 2020-01-14 22:33:17.880 7fc9c58a5700 10 monclient:
get_auth_request con 0x55aa83436d80 auth_method 0
   -25> 2020-01-14 22:33:17.880 7fc9bbeaa700  2 mds.0.13470 Booting: 0:
opening snap table
   -24> 2020-01-14 22:33:17.884 7fc9c58a5700 10 monclient:
get_auth_request con 0x55aa83437680 auth_method 0
   -23> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient:
get_auth_request con 0x55aa83437200 auth_method 0
   -22> 2020-01-14 22:33:17.884 7fc9bceac700  1 mds.0.journaler.pq(ro)
_finish_read_head loghead(trim 805306368, expire 807199928, write
807199928, stream_format 1).  probing for end of log (from 807199928)...
   -21> 2020-01-14 22:33:17.884 7fc9bceac700  1 mds.0.journaler.pq(ro)
probing for end of the log
   -20> 2020-01-14 22:33:17.884 7fc9bb6a9700  1
mds.0.journaler.mdlog(ro) recover start
   -19> 2020-01-14 22:33:17.884 7fc9bb6a9700  1
mds.0.journaler.mdlog(ro) read_head
   -18> 2020-01-14 22:33:17.884 7fc9bb6a9700  4 mds.0.log Waiting for
journal 0x200 to recover...
   -17> 2020-01-14 22:33:17.884 7fc9c68a7700 10 monclient:
get_auth_request con 0x55aa83437f80 auth_method 0
   -16> 2020-01-14 22:33:17.884 7fc9c60a6700 10 monclient:
get_auth_request con 0x55aa83438400 auth_method 0
   -15> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 98280931328,
expire 98282151365, write 98282247624, stream_format 1).  probing for
end of log (from 98282247624)...
   -14> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
mds.0.journaler.mdlog(ro) probing for end of the log
   -13> 2020-01-14 22:33:17.892 7fc9bceac700  1 mds.0.journaler.pq(ro)
_finish_probe_end write_pos = 807199928 (header had 807199928). recovered.
   -12> 2020-01-14 22:33:17.892 7fc9bceac700  4 mds.0.purge_queue
operator(): open complete
   -11> 2020-01-14 22:33:17.892 7fc9bceac700  1 mds.0.journaler.pq(ro)
set_writeable
   -10> 2020-01-14 22:33:17.892 7fc9bbeaa700  1
mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 98283021535
(header had 98282247624). recovered.
    -9> 2020-01-14 22:33:17.892 7fc9bb6a9700  4 mds.0.log Journal 0x200
recovered.
    -8> 2020-01-14 22:33:17.892 7fc9bb6a9700  4 mds.0.log Recovered
journal 0x200 in format 1
    -7> 2020-01-14 22:33:17.892 7fc9bb6a9700  2 mds.0.13470 Booting: 1:
loading/discovering base inodes
    -6> 2020-01-14 22:33:17.892 7fc9bb6a9700  0 mds.0.cache creating
system inode with ino:0x100
    -5> 2020-01-14 22:33:17.892 7fc9bb6a9700  0 mds.0.cache creating
system inode with ino:0x1
    -4> 2020-01-14 22:33:17.896 7fc9bbeaa700  2 mds.0.13470 Booting: 2:
replaying mds log
    -3> 2020-01-14 22:33:17.896 7fc9bbeaa700  2 mds.0.13470 Booting: 2:
waiting for purge queue recovered
    -2> 2020-01-14 22:33:17.908 7fc9ba6a7700 -1 log_channel(cluster) log
[ERR] : ESession.replay sessionmap v 7561128 - 1 > table 0
    -1> 2020-01-14 22:33:17.912 7fc9ba6a7700 -1
/build/ceph-14.2.5/src/mds/journal.cc: In function 'virtual void
ESession::replay(MDSRank*)' thread 7fc9ba6a7700 time 2020-01-14
22:33:17.912135
/build/ceph-14.2.5/src/mds/journal.cc: 1655: FAILED
ceph_assert(g_conf()->mds_wipe_sessions)

Am 14.01.20 um 21:19 schrieb Oskar Malnowicz:
> this was the new state. the results are equal to florians
>
> $ time cephfs-data-scan scan_extents cephfs_data
> cephfs-data-scan scan_extents cephfs_data  1.86s user 1.47s system 21%
> cpu 15.397 total
>
> $ time cephfs-data-scan scan_inodes cephfs_data
> cephfs-data-scan scan_inodes cephfs_data  2.76s user 2.05s system 26%
> cpu 17.912 total
>
> $ time cephfs-data-scan scan_links
> cephfs-data-scan scan_links  0.13s user 0.11s system 31% cpu 0.747 total
>
> $ time cephfs-data-scan scan_links
> cephfs-data-scan scan_links  0.13s user 0.12s system 33% cpu 0.735 total
>
> $ time cephfs-data-scan cleanup cephfs_data
> cephfs-data-scan cleanup cephfs_data  1.64s user 1.37s system 12% cpu
> 23.922 total
>
> mds / $ du -sh
> 31G
>
> $ df -h
> ip1,ip2,ip3:/  5.2T  2.1T  3.1T  41% /storage/cephfs_test1
>
> $ ceph df detail
> RAW STORAGE:
>     CLASS SIZE  

[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
this was the new state. the results are equal to florians

$ time cephfs-data-scan scan_extents cephfs_data
cephfs-data-scan scan_extents cephfs_data  1.86s user 1.47s system 21%
cpu 15.397 total

$ time cephfs-data-scan scan_inodes cephfs_data
cephfs-data-scan scan_inodes cephfs_data  2.76s user 2.05s system 26%
cpu 17.912 total

$ time cephfs-data-scan scan_links
cephfs-data-scan scan_links  0.13s user 0.11s system 31% cpu 0.747 total

$ time cephfs-data-scan scan_links
cephfs-data-scan scan_links  0.13s user 0.12s system 33% cpu 0.735 total

$ time cephfs-data-scan cleanup cephfs_data
cephfs-data-scan cleanup cephfs_data  1.64s user 1.37s system 12% cpu
23.922 total

mds / $ du -sh
31G

$ df -h
ip1,ip2,ip3:/  5.2T  2.1T  3.1T  41% /storage/cephfs_test1

$ ceph df detail
RAW STORAGE:
    CLASS SIZE    AVAIL   USED    RAW USED %RAW USED
    hdd   7.8 TiB 7.5 TiB 312 GiB  329 GiB  4.14
    TOTAL 7.8 TiB 7.5 TiB 312 GiB  329 GiB  4.14

POOLS:
    POOL    ID STORED  OBJECTS USED   
%USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY 
USED COMPR UNDER COMPR
    cephfs_data  6 2.1 TiB   2.48M 2.1 TiB
25.00   3.1 TiB N/A   N/A 
2.48M    0 B 0 B
    cephfs_metadata  7 7.3 MiB 379 7.3 MiB
0   3.1 TiB N/A   N/A    379   
0 B 0 B


Am 14.01.20 um 21:06 schrieb Patrick Donnelly:
> I'm asking that you get the new state of the file system tree after
> recovering from the data pool. Florian wrote that before I asked you
> to do this...
>
> How long did it take to run the cephfs-data-scan commands?
>
> On Tue, Jan 14, 2020 at 11:58 AM Oskar Malnowicz
>  wrote:
>> as florian already wrote, `du -hc` shows a total usage of 31G, but `ceph
>> df` show us an usage of 2.1
>>
>> # du -hs
>> 31G
>>
>> # ceph df
>> cephfs_data  6 2.1 TiB   2.48M 2.1 TiB 25.00   3.1 TiB
>>
>> Am 14.01.20 um 20:44 schrieb Patrick Donnelly:
>>> On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz
>>>  wrote:
 i run this commands, but still the same problems
>>> Which problems?
>>>
 $ cephfs-data-scan scan_extents cephfs_data

 $ cephfs-data-scan scan_inodes cephfs_data

 $ cephfs-data-scan scan_links
 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap  updating last_snap 1
 -> 27

 $ cephfs-data-scan cleanup cephfs_data

 do you have other ideas ?
>>> After you complete this, you should see the deleted files in your file
>>> system tree (if this is indeed the issue). What's the output of `du
>>> -hc`?
>>>
>>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Patrick Donnelly
I'm asking that you get the new state of the file system tree after
recovering from the data pool. Florian wrote that before I asked you
to do this...

How long did it take to run the cephfs-data-scan commands?

On Tue, Jan 14, 2020 at 11:58 AM Oskar Malnowicz
 wrote:
>
> as florian already wrote, `du -hc` shows a total usage of 31G, but `ceph
> df` show us an usage of 2.1
>
> # du -hs
> 31G
>
> # ceph df
> cephfs_data  6 2.1 TiB   2.48M 2.1 TiB 25.00   3.1 TiB
>
> Am 14.01.20 um 20:44 schrieb Patrick Donnelly:
> > On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz
> >  wrote:
> >> i run this commands, but still the same problems
> > Which problems?
> >
> >> $ cephfs-data-scan scan_extents cephfs_data
> >>
> >> $ cephfs-data-scan scan_inodes cephfs_data
> >>
> >> $ cephfs-data-scan scan_links
> >> 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap  updating last_snap 1
> >> -> 27
> >>
> >> $ cephfs-data-scan cleanup cephfs_data
> >>
> >> do you have other ideas ?
> > After you complete this, you should see the deleted files in your file
> > system tree (if this is indeed the issue). What's the output of `du
> > -hc`?
> >
>
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
as florian already wrote, `du -hc` shows a total usage of 31G, but `ceph
df` show us an usage of 2.1

# du -hs
31G

# ceph df
cephfs_data  6 2.1 TiB   2.48M 2.1 TiB 25.00   3.1 TiB

Am 14.01.20 um 20:44 schrieb Patrick Donnelly:
> On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz
>  wrote:
>> i run this commands, but still the same problems
> Which problems?
>
>> $ cephfs-data-scan scan_extents cephfs_data
>>
>> $ cephfs-data-scan scan_inodes cephfs_data
>>
>> $ cephfs-data-scan scan_links
>> 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap  updating last_snap 1
>> -> 27
>>
>> $ cephfs-data-scan cleanup cephfs_data
>>
>> do you have other ideas ?
> After you complete this, you should see the deleted files in your file
> system tree (if this is indeed the issue). What's the output of `du
> -hc`?
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Patrick Donnelly
On Tue, Jan 14, 2020 at 11:40 AM Oskar Malnowicz
 wrote:
>
> i run this commands, but still the same problems

Which problems?

> $ cephfs-data-scan scan_extents cephfs_data
>
> $ cephfs-data-scan scan_inodes cephfs_data
>
> $ cephfs-data-scan scan_links
> 2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap  updating last_snap 1
> -> 27
>
> $ cephfs-data-scan cleanup cephfs_data
>
> do you have other ideas ?

After you complete this, you should see the deleted files in your file
system tree (if this is indeed the issue). What's the output of `du
-hc`?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
i run this commands, but still the same problems

$ cephfs-data-scan scan_extents cephfs_data

$ cephfs-data-scan scan_inodes cephfs_data

$ cephfs-data-scan scan_links
2020-01-14 20:36:45.110 7ff24200ef80 -1 mds.0.snap  updating last_snap 1
-> 27

$ cephfs-data-scan cleanup cephfs_data

do you have other ideas ?

Am 14.01.20 um 20:32 schrieb Patrick Donnelly:
> On Tue, Jan 14, 2020 at 11:24 AM Oskar Malnowicz
>  wrote:
>> $ ceph daemon mds.who flush journal
>> {
>> "message": "",
>> "return_code": 0
>> }
>>
>>
>> $ cephfs-table-tool 0 reset session
>> {
>> "0": {
>> "data": {},
>> "result": 0
>> }
>> }
>>
>> $ cephfs-table-tool 0 reset snap
>> {
>> "result": 0
>> }
>>
>> $ cephfs-table-tool 0 reset inode
>> {
>> "0": {
>> "data": {},
>> "result": 0
>> }
>> }
>>
>> $ cephfs-journal-tool --rank=cephfs_test1:0 journal reset
>> old journal was 98282151365~92872
>> new journal start will be 98285125632 (2881395 bytes past old end)
>> writing journal head
>> writing EResetJournal entry
>> done
>>
>> $ cephfs-data-scan init
>> Inode 0x0x1 already exists, skipping create.  Use --force-init to
>> overwrite the existing object.
>> Inode 0x0x100 already exists, skipping create.  Use --force-init to
>> overwrite the existing object.
>>
>> Should i run with --force-init flag ?
> No, that shouldn't be necessary.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Patrick Donnelly
On Tue, Jan 14, 2020 at 11:24 AM Oskar Malnowicz
 wrote:
>
> $ ceph daemon mds.who flush journal
> {
> "message": "",
> "return_code": 0
> }
>
>
> $ cephfs-table-tool 0 reset session
> {
> "0": {
> "data": {},
> "result": 0
> }
> }
>
> $ cephfs-table-tool 0 reset snap
> {
> "result": 0
> }
>
> $ cephfs-table-tool 0 reset inode
> {
> "0": {
> "data": {},
> "result": 0
> }
> }
>
> $ cephfs-journal-tool --rank=cephfs_test1:0 journal reset
> old journal was 98282151365~92872
> new journal start will be 98285125632 (2881395 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
>
> $ cephfs-data-scan init
> Inode 0x0x1 already exists, skipping create.  Use --force-init to
> overwrite the existing object.
> Inode 0x0x100 already exists, skipping create.  Use --force-init to
> overwrite the existing object.
>
> Should i run with --force-init flag ?

No, that shouldn't be necessary.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
$ ceph daemon mds.who flush journal
{
    "message": "",
    "return_code": 0
}


$ cephfs-table-tool 0 reset session
{
    "0": {
    "data": {},
    "result": 0
    }
}

$ cephfs-table-tool 0 reset snap
{
    "result": 0
}

$ cephfs-table-tool 0 reset inode
{
    "0": {
    "data": {},
    "result": 0
    }
}

$ cephfs-journal-tool --rank=cephfs_test1:0 journal reset
old journal was 98282151365~92872
new journal start will be 98285125632 (2881395 bytes past old end)
writing journal head
writing EResetJournal entry
done

$ cephfs-data-scan init
Inode 0x0x1 already exists, skipping create.  Use --force-init to
overwrite the existing object.
Inode 0x0x100 already exists, skipping create.  Use --force-init to
overwrite the existing object.

Should i run with --force-init flag ?

Am 14.01.20 um 18:48 schrieb Patrick Donnelly:
> Please try flushing the journal:
>
> ceph daemon mds.foo flush journal
>
> The problem may be caused by this bug: https://tracker.ceph.com/issues/43598
>
> As for what to do next, you would likely need to recover the deleted
> inodes from the data pool so you can retry deleting the files:
> https://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects
>
>
> On Tue, Jan 14, 2020 at 9:30 AM Oskar Malnowicz
>  wrote:
>> Hello Patrick,
>>
>> "purge_queue": {
>> "pq_executing_ops": 0,
>> "pq_executing": 0,
>> "pq_executed": 5097138
>> },
>>
>> We already restarted the MDS daemons, but no change.
>> There are no other health warnings than that one what Florian already
>> mentioned.
>>
>> cheers Oskar
>>
>> Am 14.01.20 um 17:32 schrieb Patrick Donnelly:
>>> On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz
>>>  wrote:
 `ceph daemon mds.$hostname perf dump | grep stray` shows:

> "num_strays": 0,
> "num_strays_delayed": 0,
> "num_strays_enqueuing": 0,
> "strays_created": 5097138,
> "strays_enqueued": 5097138,
> "strays_reintegrated": 0,
> "strays_migrated": 0,
>>> Can you also paste the purge queue ("pq") perf dump?
>>>
>>> It's possible the MDS has hit an ENOSPC condition that caused the MDS
>>> to go read-only. This would prevent the MDS PurgeQueue from cleaning
>>> up. Do you see a health warning that the MDS is in this state? Is so,
>>> please try restarting the MDS.
>>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Senior Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Patrick Donnelly
Please try flushing the journal:

ceph daemon mds.foo flush journal

The problem may be caused by this bug: https://tracker.ceph.com/issues/43598

As for what to do next, you would likely need to recover the deleted
inodes from the data pool so you can retry deleting the files:
https://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects


On Tue, Jan 14, 2020 at 9:30 AM Oskar Malnowicz
 wrote:
>
> Hello Patrick,
>
> "purge_queue": {
> "pq_executing_ops": 0,
> "pq_executing": 0,
> "pq_executed": 5097138
> },
>
> We already restarted the MDS daemons, but no change.
> There are no other health warnings than that one what Florian already
> mentioned.
>
> cheers Oskar
>
> Am 14.01.20 um 17:32 schrieb Patrick Donnelly:
> > On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz
> >  wrote:
> >> `ceph daemon mds.$hostname perf dump | grep stray` shows:
> >>
> >>> "num_strays": 0,
> >>> "num_strays_delayed": 0,
> >>> "num_strays_enqueuing": 0,
> >>> "strays_created": 5097138,
> >>> "strays_enqueued": 5097138,
> >>> "strays_reintegrated": 0,
> >>> "strays_migrated": 0,
> > Can you also paste the purge queue ("pq") perf dump?
> >
> > It's possible the MDS has hit an ENOSPC condition that caused the MDS
> > to go read-only. This would prevent the MDS PurgeQueue from cleaning
> > up. Do you see a health warning that the MDS is in this state? Is so,
> > please try restarting the MDS.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Oskar Malnowicz
Hello Patrick,

    "purge_queue": {
    "pq_executing_ops": 0,
    "pq_executing": 0,
    "pq_executed": 5097138
    },

We already restarted the MDS daemons, but no change.
There are no other health warnings than that one what Florian already
mentioned.

cheers Oskar

Am 14.01.20 um 17:32 schrieb Patrick Donnelly:
> On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz
>  wrote:
>> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>>
>>> "num_strays": 0,
>>> "num_strays_delayed": 0,
>>> "num_strays_enqueuing": 0,
>>> "strays_created": 5097138,
>>> "strays_enqueued": 5097138,
>>> "strays_reintegrated": 0,
>>> "strays_migrated": 0,
> Can you also paste the purge queue ("pq") perf dump?
>
> It's possible the MDS has hit an ENOSPC condition that caused the MDS
> to go read-only. This would prevent the MDS PurgeQueue from cleaning
> up. Do you see a health warning that the MDS is in this state? Is so,
> please try restarting the MDS.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS ghost usage/inodes

2020-01-14 Thread Patrick Donnelly
On Tue, Jan 14, 2020 at 5:15 AM Florian Pritz
 wrote:
> `ceph daemon mds.$hostname perf dump | grep stray` shows:
>
> > "num_strays": 0,
> > "num_strays_delayed": 0,
> > "num_strays_enqueuing": 0,
> > "strays_created": 5097138,
> > "strays_enqueued": 5097138,
> > "strays_reintegrated": 0,
> > "strays_migrated": 0,

Can you also paste the purge queue ("pq") perf dump?

It's possible the MDS has hit an ENOSPC condition that caused the MDS
to go read-only. This would prevent the MDS PurgeQueue from cleaning
up. Do you see a health warning that the MDS is in this state? Is so,
please try restarting the MDS.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io