Re: [ceph-users] osds with different disk sizes may killing performance (?? ?)

2018-04-12 Thread Ronny Aasen

On 13. april 2018 05:32, Chad William Seys wrote:

Hello,
   I think your observations suggest that, to a first approximation, 
filling drives with bytes to the same absolute level is better for 
performance than filling drives to the same percentage full. Assuming 
random distribution of PGs, this would cause the smallest drives to be 
as active as the largest drives.
   E.g. if every drive had 1TB of data, each would be equally likely to 
contain the PG of interest.
   Of course, as more data was added the smallest drives could not hold 
more and the larger drives become more active, but at least the smaller 
drives would as active as possible.


but in this case you would have a steep drop off of performance. when 
you reach the fill level where small drives do not accept more data, 
suddenly you would have a performance cliff where only your larger disks 
are doing new writes. and only larger disks doing reads on new data.



it is also easier to make the logical connection while you are 
installing new nodes/disks. then a year later when your cluster just 
happen to reach that fill level.


it would also be an easier job balancing disks between nodes when you 
are adding osd's anyway and the new ones are mostly empty. rather then 
when your small osd's are full and your large disks have significant 
data on them.




kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance (?? ?)

2018-04-12 Thread Chad William Seys

Hello,
  I think your observations suggest that, to a first approximation, 
filling drives with bytes to the same absolute level is better for 
performance than filling drives to the same percentage full. Assuming 
random distribution of PGs, this would cause the smallest drives to be 
as active as the largest drives.
  E.g. if every drive had 1TB of data, each would be equally likely to 
contain the PG of interest.
  Of course, as more data was added the smallest drives could not hold 
more and the larger drives become more active, but at least the smaller 
drives would as active as possible.


Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-mgr balancer getting started

2018-04-12 Thread Reed Dier
Hi ceph-users,

I am trying to figure out how to go about making ceph balancer do its magic, as 
I have some pretty unbalanced distribution across osd’s currently, both SSD and 
HDD.

Cluster is 12.2.4 on Ubuntu 16.04.
All OSD’s have been migrated to bluestore.

Specifically, my HDD’s are the main driver of trying to run the balancer, as I 
have a near full HDD.

> ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
>  4   hdd 7.28450  1.0 7459G 4543G  2916G 60.91 0.91 126
> 21   hdd 7.28450  1.0 7459G 4626G  2833G 62.02 0.92 130
>  0   hdd 7.28450  1.0 7459G 4869G  2589G 65.28 0.97 133
>  5   hdd 7.28450  1.0 7459G 4866G  2592G 65.24 0.97 136
> 14   hdd 7.28450  1.0 7459G 4829G  2629G 64.75 0.96 138
>  8   hdd 7.28450  1.0 7459G 4829G  2629G 64.75 0.96 139
>  7   hdd 7.28450  1.0 7459G 4959G  2499G 66.49 0.99 141
> 23   hdd 7.28450  1.0 7459G 5159G  2299G 69.17 1.03 142
>  2   hdd 7.28450  1.0 7459G 5042G  2416G 67.60 1.01 144
>  1   hdd 7.28450  1.0 7459G 5292G  2167G 70.95 1.06 145
> 10   hdd 7.28450  1.0 7459G 5441G  2018G 72.94 1.09 146
> 19   hdd 7.28450  1.0 7459G 5125G  2333G 68.72 1.02 146
>  9   hdd 7.28450  1.0 7459G 5123G  2335G 68.69 1.02 146
> 18   hdd 7.28450  1.0 7459G 5187G  2271G 69.54 1.04 149
> 22   hdd 7.28450  1.0 7459G 5369G  2089G 71.98 1.07 150
> 12   hdd 7.28450  1.0 7459G 5375G  2083G 72.07 1.07 152
> 17   hdd 7.28450  1.0 7459G 5498G  1961G 73.71 1.10 152
> 11   hdd 7.28450  1.0 7459G 5621G  1838G 75.36 1.12 154
> 15   hdd 7.28450  1.0 7459G 5576G  1882G 74.76 1.11 154
> 20   hdd 7.28450  1.0 7459G 5797G  1661G 77.72 1.16 158
>  6   hdd 7.28450  1.0 7459G 5951G  1508G 79.78 1.19 164
>  3   hdd 7.28450  1.0 7459G 5960G  1499G 79.90 1.19 166
> 16   hdd 7.28450  1.0 7459G 6161G  1297G 82.60 1.23 169
> 13   hdd 7.28450  1.0 7459G 6678G   780G 89.54 1.33 184

I sorted this on PGS, and you can see that PGs pretty well follow actual disk 
usage, and since balancer appears to attempt to distribute PGs more perfectly, 
I should get more even distribution of my usage.
Hopefully that passes the sanity check.

> ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
> 49   ssd 1.76109  1.0 1803G  882G   920G 48.96 0.73 205
> 72   ssd 1.76109  1.0 1803G  926G   876G 51.38 0.77 217
> 30   ssd 1.76109  1.0 1803G  950G   852G 52.73 0.79 222
> 48   ssd 1.76109  1.0 1803G  961G   842G 53.29 0.79 225
> 54   ssd 1.76109  1.0 1803G  980G   823G 54.36 0.81 230
> 63   ssd 1.76109  1.0 1803G  985G   818G 54.62 0.81 230
> 35   ssd 1.76109  1.0 1803G  997G   806G 55.30 0.82 233
> 45   ssd 1.76109  1.0 1803G 1002G   801G 55.58 0.83 234
> 67   ssd 1.76109  1.0 1803G 1004G   799G 55.69 0.83 234
> 42   ssd 1.76109  1.0 1803G 1006G   796G 55.84 0.83 235
> 52   ssd 1.76109  1.0 1803G 1009G   793G 56.00 0.83 238
> 61   ssd 1.76109  1.0 1803G 1014G   789G 56.24 0.84 238
> 68   ssd 1.76109  1.0 1803G 1021G   782G 56.62 0.84 238
> 32   ssd 1.76109  1.0 1803G 1021G   781G 56.67 0.84 240
> 65   ssd 1.76109  1.0 1803G 1024G   778G 56.83 0.85 240
> 26   ssd 1.76109  1.0 1803G 1022G   780G 56.72 0.84 241
> 59   ssd 1.76109  1.0 1803G 1031G   771G 57.20 0.85 241
> 47   ssd 1.76109  1.0 1803G 1035G   767G 57.42 0.86 242
> 37   ssd 1.76109  1.0 1803G 1036G   767G 57.46 0.86 243
> 28   ssd 1.76109  1.0 1803G 1043G   760G 57.85 0.86 245
> 40   ssd 1.76109  1.0 1803G 1047G   755G 58.10 0.87 245
> 41   ssd 1.76109  1.0 1803G 1046G   756G 58.06 0.86 245
> 62   ssd 1.76109  1.0 1803G 1050G   752G 58.25 0.87 245
> 39   ssd 1.76109  1.0 1803G 1051G   751G 58.30 0.87 246
> 56   ssd 1.76109  1.0 1803G 1050G   752G 58.27 0.87 246
> 70   ssd 1.76109  1.0 1803G 1041G   761G 57.75 0.86 246
> 73   ssd 1.76109  1.0 1803G 1057G   746G 58.63 0.87 247
> 44   ssd 1.76109  1.0 1803G 1056G   746G 58.58 0.87 248
> 38   ssd 1.76109  1.0 1803G 1059G   743G 58.75 0.87 249
> 51   ssd 1.76109  1.0 1803G 1063G   739G 58.99 0.88 249
> 33   ssd 1.76109  1.0 1803G 1067G   736G 59.18 0.88 250
> 36   ssd 1.76109  1.0 1803G 1071G   731G 59.41 0.88 251
> 55   ssd 1.76109  1.0 1803G 1066G   737G 59.11 0.88 251
> 27   ssd 1.76109  1.0 1803G 1078G   724G 59.81 0.89 252
> 31   ssd 1.76109  1.0 1803G 1079G   724G 59.84 0.89 252
> 69   ssd 1.76109  1.0 1803G 1075G   727G 59.63 0.89 252
> 46   ssd 1.76109  1.0 1803G 1082G   721G 60.00 0.89 253
> 58   ssd 1.76109  1.0 1803G 1081G   721G 59.98 0.89 253
> 66   ssd 1.76109  1.0 1803G 1081G   722G 59.96 0.89 253
> 34   ssd 1.76109  1.0 1803G 1091G   712G 60.52 0.90 255
> 43   ssd 1.76109  1.0 1803G 1089G   713G 60.42 0.90 256
> 64   ssd 1.76109  1.0 1803G 1097G   705G 60.87 0.91 257
> 24   ssd 1.76109  1.0 1803G 1113G   690G 61.72 0.92 260
> 25   ssd 1.76109  1.0 1803G 1146G   656G 63.58 0.95 269
> 29   ssd 1.76109  1.0 1803G 1146G   

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-12 Thread Patrick Donnelly
On Thu, Apr 12, 2018 at 5:05 AM, Mark Schouten  wrote:
> On Wed, 2018-04-11 at 17:10 -0700, Patrick Donnelly wrote:
>> No longer recommended. See:
>> http://docs.ceph.com/docs/master/cephfs/upgrading/#upgrading-the-mds-
>> cluster
>
> Shouldn't docs.ceph.com/docs/luminous/cephfs/upgrading include that
> too?

The backport is in-progress: https://github.com/ceph/ceph/pull/21352

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph version 12.2.4 - slow requests missing from health details

2018-04-12 Thread Steven Vacaroaia
Hi,

I am still struggling with my performance issue and I noticed that "ceph
health details" does not provide details about where the slow requests are

Some other people noticed that
( https://www.spinics.net/lists/ceph-users/msg43574.html )

What am I missing and /or how /where to find the OSD with issues ?

Thanks
Steven
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd not resizing even after kernel tweaks

2018-04-12 Thread Alex Gorbachev
On Thu, Apr 12, 2018 at 7:57 AM, Jason Dillaman  wrote:
> If you run "partprobe" after you resize in your second example, is the
> change visible in "parted"?

No, partprobe does not help:

root@lumd1:~# parted /dev/nbd2 p
Model: Unknown (unknown)
Disk /dev/nbd2: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End SizeFile system  Flags
 1  0.00B  2147MB  2147MB  xfs

root@lumd1:~# partprobe
root@lumd1:~# parted /dev/nbd2 p
Model: Unknown (unknown)
Disk /dev/nbd2: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End SizeFile system  Flags
 1  0.00B  2147MB  2147MB  xfs



>
> On Wed, Apr 11, 2018 at 11:01 PM, Alex Gorbachev  
> wrote:
>> On Wed, Apr 11, 2018 at 2:13 PM, Jason Dillaman  wrote:
>>> I've tested the patch on both 4.14.0 and 4.16.0 and it appears to
>>> function correctly for me. parted can see the newly added free-space
>>> after resizing the RBD image and our stress tests once again pass
>>> successfully. Do you have any additional details on the issues you are
>>> seeing?
>>
>> I recompiled again with 4.14-24 and tested, the resize shows up OK
>> when the filesystem is not mounted.  dmesg shows also the "detected
>> capacity change" message.  However, if I create a filesystem and mount
>> it, the capacity change is no longer detected.  Steps as follows:
>>
>> root@lumd1:~# rbd create -s 1024 --image-format 2 matte/n4
>> root@lumd1:~# rbd-nbd map matte/n4
>> /dev/nbd2
>> root@lumd1:~# mkfs.xfs /dev/nbd2
>> meta-data=/dev/nbd2  isize=512agcount=4, agsize=65536 blks
>>  =   sectsz=512   attr=2, projid32bit=1
>>  =   crc=1finobt=1, sparse=0
>> data =   bsize=4096   blocks=262144, imaxpct=25
>>  =   sunit=0  swidth=0 blks
>> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
>> log  =internal log   bsize=4096   blocks=2560, version=2
>>  =   sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none   extsz=4096   blocks=0, rtextents=0
>> root@lumd1:~# parted /dev/nbd2 p
>> Model: Unknown (unknown)
>> Disk /dev/nbd2: 1074MB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: loop
>> Disk Flags:
>>
>> Number  Start  End SizeFile system  Flags
>>  1  0.00B  1074MB  1074MB  xfs
>>
>> root@lumd1:~# rbd resize --pool matte --image n4 --size 2048
>> Resizing image: 100% complete...done.
>> root@lumd1:~# parted /dev/nbd2 p
>> Model: Unknown (unknown)
>> Disk /dev/nbd2: 2147MB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: loop
>> Disk Flags:
>>
>> Number  Start  End SizeFile system  Flags
>>  1  0.00B  2147MB  2147MB  xfs
>>
>> -- All is well so far, now let's mount the fs
>>
>> root@lumd1:~# mount /dev/nbd2 /mnt
>> root@lumd1:~# rbd resize --pool matte --image n4 --size 3072
>> Resizing image: 100% complete...done.
>> root@lumd1:~# parted /dev/nbd2 p
>> Model: Unknown (unknown)
>> Disk /dev/nbd2: 2147MB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: loop
>> Disk Flags:
>>
>> Number  Start  End SizeFile system  Flags
>>  1  0.00B  2147MB  2147MB  xfs
>>
>> -- Now the change is not detected
>>
>>
>>>
>>> On Wed, Apr 11, 2018 at 12:06 PM, Jason Dillaman  
>>> wrote:
 I'll give it a try locally and see if I can figure it out. Note that
 this commit [1] also dropped the call to "bd_set_size" within
 "nbd_size_update", which seems suspicious to me at initial glance.

 [1] 
 https://github.com/torvalds/linux/commit/29eaadc0364943b6352e8994158febcb699c9f9b#diff-bc9273bcb259fef182ae607a1d06a142L180

 On Wed, Apr 11, 2018 at 11:09 AM, Alex Gorbachev 
  wrote:
>> On Wed, Apr 11, 2018 at 10:27 AM, Alex Gorbachev 
>>  wrote:
>>> On Wed, Apr 11, 2018 at 2:43 AM, Mykola Golub  
>>> wrote:
 On Tue, Apr 10, 2018 at 11:14:58PM -0400, Alex Gorbachev wrote:

> So Josef fixed the one issue that enables e.g. lsblk and sysfs size to
> reflect the correct siz on change.  However, partptobe and parted
> still do not detect the change, complete unmap and remap of rbd-nbd
> device and remount of the filesystem is required.

 Does your rbd-nbd include this fix [1], targeted for v12.2.3?

 [1] http://tracker.ceph.com/issues/22172
>>>
>>> It should, the rbd-nbd version is 12.2.4
>>>
>>> root@lumd1:~# rbd-nbd -v
>>> ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous 
>>> (stable)
>>> ___
>
> On Wed, Apr 11, 2018 at 10:39 AM, Jason 

Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Steve Taylor
I can't comment directly on the relation XFS fragmentation has to Bluestore, 
but I had a similar issue probably 2-3 years ago where XFS fragmentation was 
causing a significant degradation in cluster performance. The use case was RBDs 
with lots of snapshots created and deleted at regular intervals. XFS got pretty 
severely fragmented and the cluster slowed down quickly.

The solution I found was to set the XFS allocsize to match the RBD object size 
via osd_mount_options_xfs. Of course I also had to defragment XFS to clear up 
the existing fragmentation, but that was fairly painless. XFS fragmentation 
hasn't been an issue since. That solution isn't as applicable in an object 
store use case where the object size is more variable, but increasing the XFS 
allocsize could still help.

As far as Bluestore goes, I haven't deployed it in production yet, but I would 
expect that manipulating bluestore_min_alloc_size in a similar fashion would 
yield similar benefits. Of course you are then wasting some disk space for 
every object that ends up being smaller than that allocation size in both 
cases. That's the trade-off.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Thu, 2018-04-12 at 04:13 +0200, Marc Roos wrote:


Is that not obvious? The 8TB is handling twice as much as the 4TB. Afaik
there is not a linear relationship with the iops of a disk and its size.


But interesting about this xfs defragmentation, how does this
relate/compare to bluestore?





-Original Message-
From: ? ?? [mailto:yaozong...@outlook.com]
Sent: donderdag 12 april 2018 4:36
To: ceph-users@lists.ceph.com
Subject: *SPAM* [ceph-users] osds with different disk sizes may
killing performance
Importance: High

Hi,

For anybody who may be interested, here I share a process of locating
the reason for ceph cluster performance slow down in our environment.

Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw
user data is about 500TB. Each day, 3TB' data is uploaded and 3TB oldest
data is lifecycled (we are using s3 object store, and bucket lifecycle
is enabled). As time goes by, the cluster becomes some slower, we doubt
the xfs fragmentation is the fiend.

After some testing, we do find xfs fragmentation slow down filestore's
performance, for example, at 15% fragmentation, the performance is 85%
of the original, and at 25%, the performance is 74.73% of the original.

But the main reason for our cluster's deterioration of performance is
not the xfs fragmentation.

Initially, our ceph cluster contains only osds with 4TB's disk, as time
goes by, we scale out our cluster by adding some new osds with 8TB's
disk. And as the new disk's capacity is double times of the old disks,
so each new osd's weight is double of the old osd. And new osd has
double pgs than old osd, and new osd used double disk space than the old
osd. Everything looks good and fine.

But even though the new osd has double capacity than the old osd, the
new osd's performance is not double than the old osd. After digging into
our internal system stats, we find the new added's disk io util is about
two times than the old. And from time to time, the new disks' io util
rises up to 100%. The new added osds are the performance killer. They
slow down the whole cluster's performance.

As the reason is found, the solution is very simple. After lower new
added osds's weight, the annoying slow request warnings have died away.

So the conclusion is: in cluster with different osd's disk size, osd's
weight is not only determined by its capacity, we should also have a
look at its performance.

Best wishes,
Yao Zongyou
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-12 Thread Mark Schouten
On Wed, 2018-04-11 at 17:10 -0700, Patrick Donnelly wrote:
> No longer recommended. See:
> http://docs.ceph.com/docs/master/cephfs/upgrading/#upgrading-the-mds-
> cluster

Shouldn't docs.ceph.com/docs/luminous/cephfs/upgrading include that
too?
-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-nbd not resizing even after kernel tweaks

2018-04-12 Thread Jason Dillaman
If you run "partprobe" after you resize in your second example, is the
change visible in "parted"?

On Wed, Apr 11, 2018 at 11:01 PM, Alex Gorbachev  
wrote:
> On Wed, Apr 11, 2018 at 2:13 PM, Jason Dillaman  wrote:
>> I've tested the patch on both 4.14.0 and 4.16.0 and it appears to
>> function correctly for me. parted can see the newly added free-space
>> after resizing the RBD image and our stress tests once again pass
>> successfully. Do you have any additional details on the issues you are
>> seeing?
>
> I recompiled again with 4.14-24 and tested, the resize shows up OK
> when the filesystem is not mounted.  dmesg shows also the "detected
> capacity change" message.  However, if I create a filesystem and mount
> it, the capacity change is no longer detected.  Steps as follows:
>
> root@lumd1:~# rbd create -s 1024 --image-format 2 matte/n4
> root@lumd1:~# rbd-nbd map matte/n4
> /dev/nbd2
> root@lumd1:~# mkfs.xfs /dev/nbd2
> meta-data=/dev/nbd2  isize=512agcount=4, agsize=65536 blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=1finobt=1, sparse=0
> data =   bsize=4096   blocks=262144, imaxpct=25
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> log  =internal log   bsize=4096   blocks=2560, version=2
>  =   sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> root@lumd1:~# parted /dev/nbd2 p
> Model: Unknown (unknown)
> Disk /dev/nbd2: 1074MB
> Sector size (logical/physical): 512B/512B
> Partition Table: loop
> Disk Flags:
>
> Number  Start  End SizeFile system  Flags
>  1  0.00B  1074MB  1074MB  xfs
>
> root@lumd1:~# rbd resize --pool matte --image n4 --size 2048
> Resizing image: 100% complete...done.
> root@lumd1:~# parted /dev/nbd2 p
> Model: Unknown (unknown)
> Disk /dev/nbd2: 2147MB
> Sector size (logical/physical): 512B/512B
> Partition Table: loop
> Disk Flags:
>
> Number  Start  End SizeFile system  Flags
>  1  0.00B  2147MB  2147MB  xfs
>
> -- All is well so far, now let's mount the fs
>
> root@lumd1:~# mount /dev/nbd2 /mnt
> root@lumd1:~# rbd resize --pool matte --image n4 --size 3072
> Resizing image: 100% complete...done.
> root@lumd1:~# parted /dev/nbd2 p
> Model: Unknown (unknown)
> Disk /dev/nbd2: 2147MB
> Sector size (logical/physical): 512B/512B
> Partition Table: loop
> Disk Flags:
>
> Number  Start  End SizeFile system  Flags
>  1  0.00B  2147MB  2147MB  xfs
>
> -- Now the change is not detected
>
>
>>
>> On Wed, Apr 11, 2018 at 12:06 PM, Jason Dillaman  wrote:
>>> I'll give it a try locally and see if I can figure it out. Note that
>>> this commit [1] also dropped the call to "bd_set_size" within
>>> "nbd_size_update", which seems suspicious to me at initial glance.
>>>
>>> [1] 
>>> https://github.com/torvalds/linux/commit/29eaadc0364943b6352e8994158febcb699c9f9b#diff-bc9273bcb259fef182ae607a1d06a142L180
>>>
>>> On Wed, Apr 11, 2018 at 11:09 AM, Alex Gorbachev  
>>> wrote:
> On Wed, Apr 11, 2018 at 10:27 AM, Alex Gorbachev 
>  wrote:
>> On Wed, Apr 11, 2018 at 2:43 AM, Mykola Golub  
>> wrote:
>>> On Tue, Apr 10, 2018 at 11:14:58PM -0400, Alex Gorbachev wrote:
>>>
 So Josef fixed the one issue that enables e.g. lsblk and sysfs size to
 reflect the correct siz on change.  However, partptobe and parted
 still do not detect the change, complete unmap and remap of rbd-nbd
 device and remount of the filesystem is required.
>>>
>>> Does your rbd-nbd include this fix [1], targeted for v12.2.3?
>>>
>>> [1] http://tracker.ceph.com/issues/22172
>>
>> It should, the rbd-nbd version is 12.2.4
>>
>> root@lumd1:~# rbd-nbd -v
>> ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous 
>> (stable)
>> ___

 On Wed, Apr 11, 2018 at 10:39 AM, Jason Dillaman  
 wrote:
> Do you have a preliminary patch that we can test against?
> Jason

 Hi Jason,

 This is the one Josef did, which fixes detection of the new size via
 sysfs (lsblk eetc.), but parted still requires complete unmapping and
 remapping of the NBD device to detect the change.

 I built a kernel with this patch based on 4.14.24, and also tested
 with the latest mainline 4.16.1 - same behavior.

 cc: sta...@vger.kernel.org
 Fixes: 639812a ("nbd: don't set the device size until we're connected")
 Signed-off-by: Josef Bacik 
 ---
  drivers/block/nbd.c | 2 ++
  1 file changed, 2 insertions(+)

 diff 

Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for $object?

2018-04-12 Thread Marc Roos
 
Oh that is very good to hear. So how should I be cleaning this up? I 
read some post of Sage that scrubbing is not taking care of this. 
Should I be dumping the logs with objects like 
17:e80576a8:::rbd_data.2cc7df2ae8944a.09f8:27  and try to 
delete these manually?




-Original Message-
From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Sent: donderdag 12 april 2018 11:04
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for 
$object?

Usually the problem is not that you are missing snapshot data, but that 
you got too many snapshots, so your snapshots are probably fine. You're 
just wasting space.



Paul


2018-04-10 16:07 GMT+02:00 Marc Roos :


 
Hi Paul,

This is a small test cluster, and the rbd pool is replicated. I am 
hardly using any clients on the cluster. Furthermore I have been 
the 
only one creating the snapshots and I know for sure that I was not 
trying to delete them. If so I have been doing this on one of the 
ceph 
nodes.

I have these issues on images with
create_timestamp: Tue Jul 18 20:51:40 2017
create_timestamp: Fri Sep  1 13:55:25 2017
create_timestamp: Fri Sep  1 13:59:10 2017
create_timestamp: Wed Jan  3 16:38:57 2018

Updates have been done in February, so theoretically I should not 
be 
seeing these than any more?
Feb 21 15:13:35 Updated: 2:ceph-osd-12.2.3-0.el7.x86_64
Feb 28 13:33:27 Updated: 2:ceph-osd-12.2.4-0.el7.x86_64

How can I determine what snapshot is bad of this image? 
Should this snapshot be considered lost?
And is deleting this snapshot the only way to fix this? 


-Original Message-
From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Sent: dinsdag 10 april 2018 20:14
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for 
$object?


Hi,


you'll usually see this if there are "orphaned" snapshot objects. 
One 
common cause for this are
pre-12.2.2 clients trying to delete RBD snapshots with a data pool 
(i.e., erasure coded pools) They send the snapshot requests to the 
wrong 
pool and you end up with lots of problems.



Paul


2018-04-09 16:55 GMT+02:00 Marc Roos :



I have this on a rbd pool with images/snapshots that have 
been 
created
in Luminous

> Hi Stefan, Mehmet,
>
> Are these clusters that were upgraded from prior 
versions, or 
fresh
> luminous installs?
>
>
> This message indicates that there is a stray clone object 
with no
> associated head or snapdir object.  That normally should 
never
> happen--it's presumably the result of a (hopefully old) 
bug.  The
scrub
> process doesn't even clean them up, which maybe says 
something 
about
how
> common it is/was...
>
> sage

>




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
  
 > 





-- 

--
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90








-- 

--
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread ulembke

Hi,
you can also set the primary_affinity to 0.5 at the 8TB-disks to lower 
the reading access (in this case you don't waste 50% of space).



Udo

Am 2018-04-12 04:36, schrieb ? ??:

Hi, 

For anybody who may be interested, here I share a process of locating
the reason for ceph cluster performance slow down in our environment.

Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw
user data is about 500TB. Each day, 3TB' data is uploaded and 3TB
oldest data is lifecycled (we are using s3 object store, and bucket
lifecycle is enabled). As time goes by, the cluster becomes some
slower, we doubt the xfs fragmentation is the fiend. 

After some testing, we do find xfs fragmentation slow down filestore's
performance, for example, at 15% fragmentation, the performance is 85%
of the original, and at 25%, the performance is 74.73% of the
original.

But the main reason for our cluster's deterioration of performance is
not the xfs fragmentation.

Initially, our ceph cluster contains only osds with 4TB's disk, as
time goes by, we scale out our cluster by adding some new osds with
8TB's disk. And as the new disk's capacity is double times of the old
disks, so each new osd's weight is double of the old osd. And new osd
has double pgs than old osd, and new osd used double disk space than
the old osd. Everything looks good and fine.

But even though the new osd has double capacity than the old osd, the
new osd's performance is not double than the old osd. After digging
into our internal system stats, we find the new added's disk io util
is about two times than the old. And from time to time, the new disks'
io util rises up to 100%. The new added osds are the performance
killer. They slow down the whole cluster's performance.

As the reason is found, the solution is very simple. After lower new
added osds's weight, the annoying slow request warnings have died
away.

So the conclusion is: in cluster with different osd's disk size, osd's
weight is not only determined by its capacity, we should also have a
look at its performance.

Best wishes,
Yao Zongyou
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for $object?

2018-04-12 Thread Paul Emmerich
Usually the problem is not that you are missing snapshot data, but that you
got
too many snapshots, so your snapshots are probably fine. You're just wasting
space.


Paul

2018-04-10 16:07 GMT+02:00 Marc Roos :

>
> Hi Paul,
>
> This is a small test cluster, and the rbd pool is replicated. I am
> hardly using any clients on the cluster. Furthermore I have been the
> only one creating the snapshots and I know for sure that I was not
> trying to delete them. If so I have been doing this on one of the ceph
> nodes.
>
> I have these issues on images with
> create_timestamp: Tue Jul 18 20:51:40 2017
> create_timestamp: Fri Sep  1 13:55:25 2017
> create_timestamp: Fri Sep  1 13:59:10 2017
> create_timestamp: Wed Jan  3 16:38:57 2018
>
> Updates have been done in February, so theoretically I should not be
> seeing these than any more?
> Feb 21 15:13:35 Updated: 2:ceph-osd-12.2.3-0.el7.x86_64
> Feb 28 13:33:27 Updated: 2:ceph-osd-12.2.4-0.el7.x86_64
>
> How can I determine what snapshot is bad of this image?
> Should this snapshot be considered lost?
> And is deleting this snapshot the only way to fix this?
>
>
> -Original Message-
> From: Paul Emmerich [mailto:paul.emmer...@croit.io]
> Sent: dinsdag 10 april 2018 20:14
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for
> $object?
>
> Hi,
>
>
> you'll usually see this if there are "orphaned" snapshot objects. One
> common cause for this are
> pre-12.2.2 clients trying to delete RBD snapshots with a data pool
> (i.e., erasure coded pools) They send the snapshot requests to the wrong
> pool and you end up with lots of problems.
>
>
>
> Paul
>
>
> 2018-04-09 16:55 GMT+02:00 Marc Roos :
>
>
>
> I have this on a rbd pool with images/snapshots that have been
> created
> in Luminous
>
> > Hi Stefan, Mehmet,
> >
> > Are these clusters that were upgraded from prior versions, or
> fresh
> > luminous installs?
> >
> >
> > This message indicates that there is a stray clone object with no
> > associated head or snapdir object.  That normally should never
> > happen--it's presumably the result of a (hopefully old) bug.  The
> scrub
> > process doesn't even clean them up, which maybe says something
> about
> how
> > common it is/was...
> >
> > sage
>
> >
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>
>
>
>
> --
>
> --
> Paul Emmerich
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
>
>


-- 
-- 
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dying OSDs

2018-04-12 Thread Paul Emmerich
Hi,

thanks, but unfortunately it's not the thing I suspected :(
Anyways, there's something wrong with your snapshots, the log also contains
a lot of entries like this:

2018-04-09 06:58:53.703353 7fb8931a0700 -1 osd.28 pg_epoch: 88438 pg[0.5d(
v 88438'223279 (86421'221681,88438'223279] local-lis/les=87450/87451 n=5634
ec=115/115 lis/c 87450/87450 les/c/f 87451/87451/0 87352/87450/87450)
[37,6,28] r=2 lpr=87450 luod=0'0 crt=88438'223279 lcod 88438'223278 active]
_scan_snaps no head for
0:ba087b0f:::rbd_data.221bf2eb141f2.1436:46aa (have MIN)

The cluster I've debugged with the same crash also got a lot of snapshot
problems including this one.
In the end, only manually marking all snap_ids as deleted in the pool
helped.


Paul

2018-04-10 21:48 GMT+02:00 Jan Marquardt :

> Am 10.04.18 um 20:22 schrieb Paul Emmerich:
> > Hi,
> >
> > I encountered the same crash a few months ago, see
> > https://tracker.ceph.com/issues/23030
> >
> > Can you post the output of
> >
> >ceph osd pool ls detail -f json-pretty
> >
> >
> > Paul
>
> Yes, of course.
>
> # ceph osd pool ls detail -f json-pretty
>
> [
> {
> "pool_name": "rbd",
> "flags": 1,
> "flags_names": "hashpspool",
> "type": 1,
> "size": 3,
> "min_size": 2,
> "crush_rule": 0,
> "object_hash": 2,
> "pg_num": 768,
> "pg_placement_num": 768,
> "crash_replay_interval": 0,
> "last_change": "91256",
> "last_force_op_resend": "0",
> "last_force_op_resend_preluminous": "0",
> "auid": 0,
> "snap_mode": "selfmanaged",
> "snap_seq": 35020,
> "snap_epoch": 91219,
> "pool_snaps": [],
> "removed_snaps":
> "[1~4562,47f1~58,484a~9,4854~70,48c5~36,48fc~48,4945~d,
> 4953~1,4957~1,495a~3,4960~1,496e~3,497a~1,4980~2,4983~3,
> 498b~1,4997~1,49a8~1,49ae~1,49b1~2,49b4~1,49b7~1,49b9~3,
> 49bd~5,49c3~6,49ca~5,49d1~4,49d6~1,49d8~2,49df~2,49e2~1,
> 49e4~2,49e7~5,49ef~2,49f2~2,49f5~6,49fc~1,49fe~3,4a05~9,
> 4a0f~4,4a14~4,4a1a~6,4a21~6,4a29~2,4a2c~3,4a30~1,4a33~5,
> 4a39~3,4a3e~b,4a4a~1,4a4c~2,4a50~1,4a52~7,4a5a~1,4a5c~2,
> 4a5f~4,4a64~1,4a66~2,4a69~2,4a6c~4,4a72~1,4a74~2,4a78~3,
> 4a7c~6,4a84~2,4a87~b,4a93~4,4a99~1,4a9c~4,4aa1~7,4aa9~1,
> 4aab~6,4ab2~2,4ab5~5,4abb~2,4abe~9,4ac8~a,4ad3~4,4ad8~13,
> 4aec~16,4b03~6,4b0a~c,4b17~2,4b1a~3,4b1f~4,4b24~c,4b31~d,
> 4b3f~13,4b53~1,4bfc~13ed,61e1~4a,622c~8,6235~a0,62d6~ac,
> 63a6~2,63b2~2,63d0~2,63f7~2,6427~2,6434~10f]",
> "quota_max_bytes": 0,
> "quota_max_objects": 0,
> "tiers": [],
> "tier_of": -1,
> "read_tier": -1,
> "write_tier": -1,
> "cache_mode": "none",
> "target_max_bytes": 0,
> "target_max_objects": 0,
> "cache_target_dirty_ratio_micro": 0,
> "cache_target_dirty_high_ratio_micro": 0,
> "cache_target_full_ratio_micro": 0,
> "cache_min_flush_age": 0,
> "cache_min_evict_age": 0,
> "erasure_code_profile": "",
> "hit_set_params": {
> "type": "none"
> },
> "hit_set_period": 0,
> "hit_set_count": 0,
> "use_gmt_hitset": true,
> "min_read_recency_for_promote": 0,
> "min_write_recency_for_promote": 0,
> "hit_set_grade_decay_rate": 0,
> "hit_set_search_last_n": 0,
> "grade_table": [],
> "stripe_width": 0,
> "expected_num_objects": 0,
> "fast_read": false,
> "options": {},
> "application_metadata": {
> "rbd": {}
> }
> }
> ]
>
> "Unfortunately" I started the crashed OSDs again in the meantime,
> because the first pgs have been down before. So currently all OSDs are
> running.
>
> Regards,
>
> Jan
>
>
>


-- 
-- 
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Marc Roos
 
Is that not obvious? The 8TB is handling twice as much as the 4TB. Afaik 
there is not a linear relationship with the iops of a disk and its size. 


But interesting about this xfs defragmentation, how does this 
relate/compare to bluestore?





-Original Message-
From: ? ?? [mailto:yaozong...@outlook.com] 
Sent: donderdag 12 april 2018 4:36
To: ceph-users@lists.ceph.com
Subject: *SPAM* [ceph-users] osds with different disk sizes may 
killing performance
Importance: High

Hi, 

For anybody who may be interested, here I share a process of locating 
the reason for ceph cluster performance slow down in our environment.

Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw 
user data is about 500TB. Each day, 3TB' data is uploaded and 3TB oldest 
data is lifecycled (we are using s3 object store, and bucket lifecycle 
is enabled). As time goes by, the cluster becomes some slower, we doubt 
the xfs fragmentation is the fiend. 

After some testing, we do find xfs fragmentation slow down filestore's 
performance, for example, at 15% fragmentation, the performance is 85% 
of the original, and at 25%, the performance is 74.73% of the original.

But the main reason for our cluster's deterioration of performance is 
not the xfs fragmentation.

Initially, our ceph cluster contains only osds with 4TB's disk, as time 
goes by, we scale out our cluster by adding some new osds with 8TB's 
disk. And as the new disk's capacity is double times of the old disks, 
so each new osd's weight is double of the old osd. And new osd has 
double pgs than old osd, and new osd used double disk space than the old 
osd. Everything looks good and fine.

But even though the new osd has double capacity than the old osd, the 
new osd's performance is not double than the old osd. After digging into 
our internal system stats, we find the new added's disk io util is about 
two times than the old. And from time to time, the new disks' io util 
rises up to 100%. The new added osds are the performance killer. They 
slow down the whole cluster's performance.

As the reason is found, the solution is very simple. After lower new 
added osds's weight, the annoying slow request warnings have died away.

So the conclusion is: in cluster with different osd's disk size, osd's 
weight is not only determined by its capacity, we should also have a 
look at its performance.

Best wishes,
Yao Zongyou
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Konstantin Shalygin

On 04/12/2018 11:21 AM, 宗友 姚 wrote:

Currently, this can only be done by hand. Maybe we need some scripts to handle 
this automatically.



Mixed hosts, i.e. half old disks + half new disks is better than "old 
hosts" and "new hosts" in your case.




k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Wido den Hollander


On 04/12/2018 04:36 AM, ? ?? wrote:
> Hi, 
> 
> For anybody who may be interested, here I share a process of locating the 
> reason for ceph cluster performance slow down in our environment.
> 
> Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw user 
> data is about 500TB. Each day, 3TB' data is uploaded and 3TB oldest data is 
> lifecycled (we are using s3 object store, and bucket lifecycle is enabled). 
> As time goes by, the cluster becomes some slower, we doubt the xfs 
> fragmentation is the fiend. 
> 
> After some testing, we do find xfs fragmentation slow down filestore's 
> performance, for example, at 15% fragmentation, the performance is 85% of the 
> original, and at 25%, the performance is 74.73% of the original.
> 
> But the main reason for our cluster's deterioration of performance is not the 
> xfs fragmentation.
> 
> Initially, our ceph cluster contains only osds with 4TB's disk, as time goes 
> by, we scale out our cluster by adding some new osds with 8TB's disk. And as 
> the new disk's capacity is double times of the old disks, so each new osd's 
> weight is double of the old osd. And new osd has double pgs than old osd, and 
> new osd used double disk space than the old osd. Everything looks good and 
> fine.
> 
> But even though the new osd has double capacity than the old osd, the new 
> osd's performance is not double than the old osd. After digging into our 
> internal system stats, we find the new added's disk io util is about two 
> times than the old. And from time to time, the new disks' io util rises up to 
> 100%. The new added osds are the performance killer. They slow down the whole 
> cluster's performance.
> 
> As the reason is found, the solution is very simple. After lower new added 
> osds's weight, the annoying slow request warnings have died away.
> 

This is to be expected. However, lowering the weight of new disks means
that you can't fully use their storage capacity.

This is the nature of having a heterogeneous cluster with Ceph.
Different disks of different sizes mean that performance will fluctuate.

Wido

> So the conclusion is: in cluster with different osd's disk size, osd's weight 
> is not only determined by its capacity, we should also have a look at its 
> performance.>
> Best wishes,
> Yao Zongyou
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com