Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

Fix is on its way too...

See https://github.com/ceph/ceph/pull/28978

On 7/17/2019 8:55 PM, Paul Mezzanini wrote:

Oh my.  That's going to hurt with 788 OSDs.   Time for some creative shell 
scripts and stepping through the nodes.  I'll report back.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Igor Fedotov 
Sent: Wednesday, July 17, 2019 11:33 AM
To: Paul Mezzanini; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] disk usage reported incorrectly

Forgot to provide a workaround...

If that's the case then you need to repair each OSD with corresponding
command in ceph-objectstore-tool...

Thanks,

Igor.


On 7/17/2019 6:29 PM, Paul Mezzanini wrote:

Sometime after our upgrade to Nautilus our disk usage statistics went off the 
rails wrong.  I can't tell you exactly when it broke but I know that after the 
initial upgrade it worked at least for a bit.

Correct numbers should be something similar to: (These are copy/pasted from the 
autoscale-status report)

POOLSIZE
cephfs_metadata 327.1G
cold-ec98.36T
ceph-bulk-3r142.6T
cephfs_data31890G
ceph-hot-2r5276G
kgcoe-cinder103.2T
rbd   3098


Instead, we now show:

POOL SIZE
cephfs_metadata362.9G (correct)
cold-ec607.2G(wrong)
ceph-bulk-3r5186G (wrong)
cephfs_data1654G (wrong)
ceph-hot-2r5884G (correct I think)
kgcoe-cinder5761G   (wrong)
rbd128.0k


`ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and cephfs_data 
are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs 
metadata.  The remaining pools are all used for rbd.


Interestingly, the `ceph df` outpool for raw storage feels correct for each 
drive class while the pool usage is wrong:

RAW STORAGE:
  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
  hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08
  nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82
  nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45
  TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84

POOLS:
  POOLID STORED  OBJECTS USED%USED 
MAX AVAIL
  kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32  
 582 TiB
  ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29  
 582 TiB
  cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09  
 582 TiB
  cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35  
 3.4 TiB
  rbd 37   931 B   5 128 KiB 0  
 582 TiB
  ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72  
  74 TiB
  cold-ec 51 417 GiB 105.23M 607 GiB  0.02  
 2.1 PiB


Everything is on "ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)" and kernel 5.0.21 or 5.0.9.  I'm actually doing the patching now 
to pull the ceph cluster up to 5.0.21, same as the clients.  I'm not really sure where to 
dig into this one.  Everything is working fine except disk usage reporting.  This also 
completely blows up the autoscaler.

I feel like the question is obvious but I'll state it anyway.  How do I get 
this issue resolved?

Thanks
-paul

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

__

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini
Oh my.  That's going to hurt with 788 OSDs.   Time for some creative shell 
scripts and stepping through the nodes.  I'll report back.

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: Igor Fedotov 
Sent: Wednesday, July 17, 2019 11:33 AM
To: Paul Mezzanini; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] disk usage reported incorrectly

Forgot to provide a workaround...

If that's the case then you need to repair each OSD with corresponding
command in ceph-objectstore-tool...

Thanks,

Igor.


On 7/17/2019 6:29 PM, Paul Mezzanini wrote:
> Sometime after our upgrade to Nautilus our disk usage statistics went off the 
> rails wrong.  I can't tell you exactly when it broke but I know that after 
> the initial upgrade it worked at least for a bit.
>
> Correct numbers should be something similar to: (These are copy/pasted from 
> the autoscale-status report)
>
> POOLSIZE
> cephfs_metadata 327.1G
> cold-ec98.36T
> ceph-bulk-3r142.6T
> cephfs_data31890G
> ceph-hot-2r5276G
> kgcoe-cinder103.2T
> rbd   3098
>
>
> Instead, we now show:
>
> POOL SIZE
> cephfs_metadata362.9G (correct)
> cold-ec607.2G(wrong)
> ceph-bulk-3r5186G (wrong)
> cephfs_data1654G (wrong)
> ceph-hot-2r5884G (correct I think)
> kgcoe-cinder5761G   (wrong)
> rbd128.0k
>
>
> `ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and 
> cephfs_data are all cephfs data pools and cephfs_metadata is unsurprisingly, 
> cephfs metadata.  The remaining pools are all used for rbd.
>
>
> Interestingly, the `ceph df` outpool for raw storage feels correct for each 
> drive class while the pool usage is wrong:
>
> RAW STORAGE:
>  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
>  hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08
>  nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82
>  nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45
>  TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84
>
> POOLS:
>  POOLID STORED  OBJECTS USED%USED 
> MAX AVAIL
>  kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32 
>   582 TiB
>  ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29 
>   582 TiB
>  cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09 
>   582 TiB
>  cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35 
>   3.4 TiB
>  rbd 37   931 B   5 128 KiB 0 
>   582 TiB
>  ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72 
>74 TiB
>  cold-ec 51 417 GiB 105.23M 607 GiB  0.02 
>   2.1 PiB
>
>
> Everything is on "ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)" and kernel 
> 5.0.21 or 5.0.9.  I'm actually doing the patching now to pull the ceph 
> cluster up to 5.0.21, same as the clients.  I'm not really sure where to dig 
> into this one.  Everything is working fine except disk usage reporting.  This 
> also completely blows up the autoscaler.
>
> I feel like the question is obvious but I'll state it anyway.  How do I get 
> this issue resolved?
>
> Thanks
> -paul
>
> --
> Paul Mezzanini
> Sr Systems Administrator / Engineer, Research Computing
> Information & Technology Services
> Finance & Administration
> Rochester Institute of Technology
> o:(585) 475-3245 | pfm...@rit.edu
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
> intended only for the person(s) or entity to which it is addressed and may
> contain confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon this
> information by persons or entities other than the intended recipient is
> prohib

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

Forgot to provide a workaround...

If that's the case then you need to repair each OSD with corresponding 
command in ceph-objectstore-tool...


Thanks,

Igor.


On 7/17/2019 6:29 PM, Paul Mezzanini wrote:

Sometime after our upgrade to Nautilus our disk usage statistics went off the 
rails wrong.  I can't tell you exactly when it broke but I know that after the 
initial upgrade it worked at least for a bit.

Correct numbers should be something similar to: (These are copy/pasted from the 
autoscale-status report)

POOLSIZE
cephfs_metadata 327.1G
cold-ec98.36T
ceph-bulk-3r142.6T
cephfs_data31890G
ceph-hot-2r5276G
kgcoe-cinder103.2T
rbd   3098


Instead, we now show:

POOL SIZE
cephfs_metadata362.9G (correct)
cold-ec607.2G(wrong)
ceph-bulk-3r5186G (wrong)
cephfs_data1654G (wrong)
ceph-hot-2r5884G (correct I think)
kgcoe-cinder5761G   (wrong)
rbd128.0k


`ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and cephfs_data 
are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs 
metadata.  The remaining pools are all used for rbd.


Interestingly, the `ceph df` outpool for raw storage feels correct for each 
drive class while the pool usage is wrong:

RAW STORAGE:
 CLASS SIZEAVAIL   USEDRAW USED %RAW USED
 hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08
 nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82
 nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45
 TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84
  
POOLS:

 POOLID STORED  OBJECTS USED%USED 
MAX AVAIL
 kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32   
582 TiB
 ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29   
582 TiB
 cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09   
582 TiB
 cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35   
3.4 TiB
 rbd 37   931 B   5 128 KiB 0   
582 TiB
 ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72   
 74 TiB
 cold-ec 51 417 GiB 105.23M 607 GiB  0.02   
2.1 PiB


Everything is on "ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)" and kernel 5.0.21 or 5.0.9.  I'm actually doing the patching now 
to pull the ceph cluster up to 5.0.21, same as the clients.  I'm not really sure where to 
dig into this one.  Everything is working fine except disk usage reporting.  This also 
completely blows up the autoscaler.

I feel like the question is obvious but I'll state it anyway.  How do I get 
this issue resolved?

Thanks
-paul

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

H Paul,

there was a post from Sage named "Pool stats issue with upgrades to 
nautilus" recently.


Perhaps that's the case if you add new OSD or repair existing one...


Thanks,

Igor


On 7/17/2019 6:29 PM, Paul Mezzanini wrote:


Sometime after our upgrade to Nautilus our disk usage statistics went off the 
rails wrong.  I can't tell you exactly when it broke but I know that after the 
initial upgrade it worked at least for a bit.

Correct numbers should be something similar to: (These are copy/pasted from the 
autoscale-status report)

POOLSIZE
cephfs_metadata 327.1G
cold-ec98.36T
ceph-bulk-3r142.6T
cephfs_data31890G
ceph-hot-2r5276G
kgcoe-cinder103.2T
rbd   3098


Instead, we now show:

POOL SIZE
cephfs_metadata362.9G (correct)
cold-ec607.2G(wrong)
ceph-bulk-3r5186G (wrong)
cephfs_data1654G (wrong)
ceph-hot-2r5884G (correct I think)
kgcoe-cinder5761G   (wrong)
rbd128.0k


`ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and cephfs_data 
are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs 
metadata.  The remaining pools are all used for rbd.


Interestingly, the `ceph df` outpool for raw storage feels correct for each 
drive class while the pool usage is wrong:

RAW STORAGE:
 CLASS SIZEAVAIL   USEDRAW USED %RAW USED
 hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08
 nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82
 nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45
 TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84
  
POOLS:

 POOLID STORED  OBJECTS USED%USED 
MAX AVAIL
 kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32   
582 TiB
 ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29   
582 TiB
 cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09   
582 TiB
 cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35   
3.4 TiB
 rbd 37   931 B   5 128 KiB 0   
582 TiB
 ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72   
 74 TiB
 cold-ec 51 417 GiB 105.23M 607 GiB  0.02   
2.1 PiB


Everything is on "ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) 
nautilus (stable)" and kernel 5.0.21 or 5.0.9.  I'm actually doing the patching now 
to pull the ceph cluster up to 5.0.21, same as the clients.  I'm not really sure where to 
dig into this one.  Everything is working fine except disk usage reporting.  This also 
completely blows up the autoscaler.

I feel like the question is obvious but I'll state it anyway.  How do I get 
this issue resolved?

Thanks
-paul

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini
Sometime after our upgrade to Nautilus our disk usage statistics went off the 
rails wrong.  I can't tell you exactly when it broke but I know that after the 
initial upgrade it worked at least for a bit.  

Correct numbers should be something similar to: (These are copy/pasted from the 
autoscale-status report)

POOLSIZE
cephfs_metadata 327.1G  
cold-ec98.36T 
ceph-bulk-3r142.6T  
cephfs_data31890G 
ceph-hot-2r5276G  
kgcoe-cinder103.2T  
rbd   3098 


Instead, we now show:

POOL SIZE
cephfs_metadata362.9G (correct)
cold-ec607.2G(wrong)
ceph-bulk-3r5186G (wrong)
cephfs_data1654G (wrong)
ceph-hot-2r5884G (correct I think)
kgcoe-cinder5761G   (wrong)
rbd128.0k 


`ceph fs status` reports similar numbers.  cold-ec, ceph-hot-2r and cephfs_data 
are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs 
metadata.  The remaining pools are all used for rbd.


Interestingly, the `ceph df` outpool for raw storage feels correct for each 
drive class while the pool usage is wrong:

RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW USED 
hdd   6.3 PiB 5.2 PiB 1.1 PiB  1.1 PiB 17.08 
nvme  175 TiB 161 TiB  14 TiB   14 TiB  7.82 
nvme-meta  14 TiB  11 TiB 2.2 TiB  2.5 TiB 18.45 
TOTAL 6.5 PiB 5.4 PiB 1.1 PiB  1.1 PiB 16.84 
 
POOLS:
POOLID STORED  OBJECTS USED%USED 
MAX AVAIL 
kgcoe-cinder24 1.9 TiB  29.49M 5.6 TiB  0.32   
582 TiB 
ceph-bulk-3r32 1.7 TiB  88.28M 5.1 TiB  0.29   
582 TiB 
cephfs_data 35 518 GiB 135.68M 1.6 TiB  0.09   
582 TiB 
cephfs_metadata 36 363 GiB   5.63M 363 GiB  3.35   
3.4 TiB 
rbd 37   931 B   5 128 KiB 0   
582 TiB 
ceph-hot-2r 50 5.7 TiB  18.63M 5.7 TiB  3.72
74 TiB 
cold-ec 51 417 GiB 105.23M 607 GiB  0.02   
2.1 PiB 


Everything is on "ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)" and kernel 5.0.21 
or 5.0.9.  I'm actually doing the patching now to pull the ceph cluster up to 
5.0.21, same as the clients.  I'm not really sure where to dig into this one.  
Everything is working fine except disk usage reporting.  This also completely 
blows up the autoscaler.  

I feel like the question is obvious but I'll state it anyway.  How do I get 
this issue resolved? 

Thanks
-paul

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com