[ceph-users] http://tracker.ceph.com/issues/38122

2019-03-06 Thread Milanov, Radoslav Nikiforov
Can someone elaborate on
[cid:image001.png@01D4D44C.27BDF330]

>From http://tracker.ceph.com/issues/38122

Which exactly package is missing?
And why is this happening ? In Mimic all dependencies are resolved by yum?
- Rado

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

2018-04-19 Thread Milanov, Radoslav Nikiforov
Try filestore instead of bluestore ?

- Rado

From: ceph-users  On Behalf Of Steven 
Vacaroaia
Sent: Thursday, April 19, 2018 8:11 AM
To: ceph-users 
Subject: [ceph-users] ceph luminous 12.2.4 - 2 servers better than 3 ?

Hi,

Any idea why 2 servers with one OSD each will provide better performance than 3 
?

Servers are identical
Performance  is impacted irrespective if I used SSD for WAL/DB or not
Basically, I am getting lots of cur MB/s zero

Network is separate 10 GB for public and private
I tested it with iperf and I am getting 9.3 Gbs

I have tried replication by 2 and 3 with same results ( much better for 2 
servers than 3 )

reinstalled CEPH multiple times
ceph.conf very simple - no major customization ( see below)
I am out of ideas - any hint will be TRULY appreciated

Steven



auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx


public_network = 10.10.30.0/24
cluster_network = 192.168.0.0/24


osd_pool_default_size = 2
osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
osd_crush_chooseleaf_type = 1


[mon]
mon_allow_pool_delete = true
mon_osd_min_down_reporters = 1

[osd]
osd_mkfs_type = xfs
osd_mount_options_xfs = 
"rw,noatime,nodiratime,attr2,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=4M"
osd_mkfs_options_xfs = "-f -i size=2048"
bluestore_block_db_size = 32212254720
bluestore_block_wal_size = 1073741824

rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd 120 seq
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 
for up to 120 seconds or 0 objects
Object prefix: benchmark_data_osd01_383626
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0   0 0 0 0 0   -   0
1  165741   163.991   1640.1979290.065543
2  16574181.992 0   -0.065543
3  166751   67.993620   0.01646320.249939
4  166751   50.9951 0   -0.249939
5  167155   43.9958 8   0.01714390.319973
6  16   181   165   109.989   440   0.01590570.563746
7  16   182   166   94.8476 40.2214210.561684
8  16   182   166   82.9917 0   -0.561684
9  16   240   224   99.5458   116   0.02329890.638292
   10  16   264   248   99.190196   0.02226690.583336
   11  16   264   248   90.1729 0   -0.583336
   12  16   285   269   89.657942   0.01657060.600606
   13  16   285   269   82.7611 0   -0.600606
   14  16   310   294   83.991850   0.02542410.756351


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-01 Thread Milanov, Radoslav Nikiforov
Probably priorities have changed since RedHat acquired Ceph/InkTank  ( 
https://www.redhat.com/en/about/press-releases/red-hat-acquire-inktank-provider-ceph
 ) ?
Why support a competing hypervisor ? Long term switching to KVM seems to be the 
solution.

- Rado

From: ceph-users  On Behalf Of Max Cuttins
Sent: Thursday, March 1, 2018 7:27 AM
To: David Turner ; dilla...@redhat.com
Cc: ceph-users 
Subject: Re: [ceph-users] Ceph iSCSI is a prank?

Il 28/02/2018 18:16, David Turner ha scritto:

My thought is that in 4 years you could have migrated to a hypervisor that will 
have better performance into ceph than an added iSCSI layer. I won't deploy VMs 
for ceph on anything that won't allow librbd to work. Anything else is added 
complexity and reduced performance.

You are definitly right: I have to change hypervisor. So Why I didn't do this 
before?
Because both Citrix/Xen and Inktank/Ceph claim that they were ready to add 
support to Xen in 2013!

It was 2013:
XEN claim to support Ceph: 
https://www.citrix.com/blogs/2013/07/08/xenserver-tech-preview-incorporating-ceph-object-stores-is-now-available/
Inktank say the support for Xen was almost ready: 
https://ceph.com/geen-categorie/xenserver-support-for-rbd/

And also iSCSI was close (it was 2014):
https://ceph.com/geen-categorie/updates-to-ceph-tgt-iscsi-support/

So why change Hypervisor if everybody tell you that compatibility is almost 
ready to be deployed?
... but then "just" pass 4 years and both XEN and Ceph never become 
compatibile...

It's obvious that Citrix in not anymore belivable.
However, at least Ceph should have added iSCSI to it's platform during all 
these years.
Ceph is awesome, so why just don't kill all the competitors make it compatible 
even with washingmachine?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-30 Thread Milanov, Radoslav Nikiforov
Performance as well - in my testing FileStore was much quicker than BlueStore.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
Weil
Sent: Friday, December 29, 2017 3:51 PM
To: Travis Nielsen 
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

On Fri, 29 Dec 2017, Travis Nielsen wrote:
> Since bluestore was declared stable in Luminous, is there any 
> remaining scenario to use filestore in new deployments? Or is it safe 
> to assume that bluestore is always better to use in Luminous? All 
> documentation I can find points to bluestore being superior in all cases.

The only real reason to run FileStore is for stability reasons: FileStore is 
older and well-tested, so the most conservative users may stick to FileStore 
for a bit longer.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rebuild rgw bucket index

2017-11-18 Thread Milanov, Radoslav Nikiforov
Is there a way to rebuild the contents of .rgw.buckets.index pool removed by 
accident ?

Thanks in advance.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-17 Thread Milanov, Radoslav Nikiforov
Here's some more results, I'm reading 12.2.2 will have performance improvements 
for bluestore and should be released soon? 

Iodepth=not specified
Filestore
  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec

  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec

Bluestore
  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec

  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec

Iodepth=10
Filestore
  write: io=5045.1MB, bw=28706KB/s, iops=7176, runt=180001msec
  write: io=4764.7MB, bw=27099KB/s, iops=6774, runt=180021msec
  write: io=4626.2MB, bw=26318KB/s, iops=6579, runt=180031msec

  read : io=1745.3MB, bw=9928.6KB/s, iops=2482, runt=180001msec
  read : io=1933.7MB, bw=11000KB/s, iops=2749, runt=180001msec
  read : io=1952.7MB, bw=11108KB/s, iops=2777, runt=180001msec

Bluestore
  write: io=1578.8MB, bw=8980.9KB/s, iops=2245, runt=180006msec
  write: io=1583.9MB, bw=9010.2KB/s, iops=2252, runt=180002msec
  write: io=1591.5MB, bw=9050.9KB/s, iops=2262, runt=180009msec

  read : io=412104KB, bw=2289.5KB/s, iops=572, runt=180002msec
  read : io=718108KB, bw=3989.5KB/s, iops=997, runt=180003msec
  read : io=968388KB, bw=5379.7KB/s, iops=1344, runt=180009msec

Iodpeth=20
Filestore
  write: io=4671.2MB, bw=26574KB/s, iops=6643, runt=180001msec
  write: io=4583.4MB, bw=26066KB/s, iops=6516, runt=180054msec
  write: io=4641.6MB, bw=26347KB/s, iops=6586, runt=180395msec

  read : io=2094.3MB, bw=11914KB/s, iops=2978, runt=180001msec
  read : io=1997.6MB, bw=11364KB/s, iops=2840, runt=180001msec
  read : io=2028.4MB, bw=11539KB/s, iops=2884, runt=180001msec

Bluestore
  write: io=1595.8MB, bw=9078.2KB/s, iops=2269, runt=180001msec
  write: io=1596.2MB, bw=9080.6KB/s, iops=2270, runt=180001msec
  write: io=1588.3MB, bw=9035.4KB/s, iops=2258, runt=180002msec

  read : io=1126.9MB, bw=6410.5KB/s, iops=1602, runt=180004msec
  read : io=1282.4MB, bw=7295.3KB/s, iops=1823, runt=180003msec
  read : io=1380.9MB, bw=7854.1KB/s, iops=1963, runt=180007msec


- Rado

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Thursday, November 16, 2017 2:04 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
<drakonst...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

It depends on what you expect your typical workload to be like.  Ceph (and 
distributed storage in general) likes high io depths so writes can hit all of 
the drives at the same time.  There are tricks (like journals, writahead logs, 
centralized caches, etc) that can help mitigate this, but I suspect you'll see 
much better performance with more concurrent writes.

Regarding file size, the smaller the file, the more likely those tricks 
mentioned above are to help you.  Based on your results, it appears filestore 
may be doing a better job of it than bluestore.  The question you have to ask 
is whether or not this kind of test represents what you are likely to see for 
real on your cluster.

Doing writes over a much larger file, say 3-4x over the total amount of RAM in 
all of the nodes, helps you get a better idea of what the behavior is like when 
those tricks are less effective.  I think that's probably a more likely 
scenario in most production environments, but it's up to you which workload you 
think better represents what you are going to see in practice.  A while back 
Nick Fisk showed some results wehre bluestore was slower than filestore at 
small sync writes and it could be that we simply have more work to do in this 
area.  On the other hand, we pretty consistently see bluestore doing better 
than filestore with 4k random writes and higher IO depths, which is why I'd be 
curious to see how it goes if you try that.

Mark

On 11/16/2017 10:11 AM, Milanov, Radoslav Nikiforov wrote:
> No,
> What test parameters (iodepth/file size/numjobs) would make sense  for 3 
> node/27OSD@4TB ?
> - Rado
>
> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: Thursday, November 16, 2017 10:56 AM
> To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
> <drakonst...@gmail.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Did you happen to have a chance to try with a higher io depth?
>
> Mark
>
> On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Milanov, Radoslav Nikiforov
No,
What test parameters (iodepth/file size/numjobs) would make sense  for 3 
node/27OSD@4TB ?
- Rado

-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com] 
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; David Turner 
<drakonst...@gmail.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Did you happen to have a chance to try with a higher io depth?

Mark

On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:
> FYI
>
> Having 50GB bock.db made no difference on the performance.
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Tuesday, November 14, 2017 6:13 PM
> *To:* Milanov, Radoslav Nikiforov <rad...@bu.edu>
> *Cc:* Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> I'd probably say 50GB to leave some extra space over-provisioned.  
> 50GB should definitely prevent any DB operations from spilling over to the 
> HDD.
>
>
>
> On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov 
> <rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
>
> Thank you,
>
> It is 4TB OSDs and they might become full someday, I’ll try 60GB db
> partition – this is the max OSD capacity.
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com
> <mailto:drakonst...@gmail.com>]
> *Sent:* Tuesday, November 14, 2017 5:38 PM
>
>
> *To:* Milanov, Radoslav Nikiforov <rad...@bu.edu 
> <mailto:rad...@bu.edu>>
>
> *Cc:*Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>;
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
>
>
>
> You have to configure the size of the db partition in the config
> file for the cluster.  If you're db partition is 1GB, then I can all
> but guarantee that you're using your HDD for your blocks.db very
> quickly into your testing.  There have been multiple threads
> recently about what size the db partition should be and it seems to
> be based on how many objects your OSD is likely to have on it.  The
> recommendation has been to err on the side of bigger.  If you're
> running 10TB OSDs and anticipate filling them up, then you probably
> want closer to an 80GB+ db partition.  That's why I asked how full
> your cluster was and how large your HDDs are.
>
>
>
> Here's a link to one of the recent ML threads on this
> topic.  
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020
> 822.html
>
> On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
> <rad...@bu.edu <mailto:rad...@bu.edu>> wrote:
>
> Block-db partition is the default 1GB (is there a way to modify
> this? journals are 5GB in filestore case) and usage is low:
>
>
>
> [root@kumo-ceph02 ~]# ceph df
>
> GLOBAL:
>
> SIZEAVAIL  RAW USED %RAW USED
>
> 100602G 99146G1455G  1.45
>
> POOLS:
>
> NAME  ID USED   %USED MAX AVAIL
> OBJECTS
>
> kumo-vms  1  19757M  0.02
> 31147G5067
>
> kumo-volumes  2214G  0.18
> 31147G   55248
>
> kumo-images   3203G  0.17
> 31147G   66486
>
> kumo-vms3 11 45824M  0.04
> 31147G   11643
>
> kumo-volumes3 13 10837M     0
> 31147G2724
>
> kumo-images3  15 82450M  0.09
> 31147G   10320
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonst...@gmail.com
> <mailto:drakonst...@gmail.com>]
> *Sent:* Tuesday, November 14, 2017 4:40 PM
> *To:* Mark Nelson <mnel...@redhat.com <mailto:mnel...@redhat.com>>
> *Cc:* Milanov, Radoslav Nikiforov <rad...@bu.edu
> <mailto:rad...@bu.edu>>; ceph-users@lists.ceph.com
> <mailto:ceph-users@lists.ceph.com>
>
>
> *Subject:* Re: [ceph-users] Bluestore performance 50% of 
> filestore
>
>
>
> How big was your blocks.db partition for each OSD and what size
> are your HDDs?  Also how full is your cluster?  It's possible
> that your blocks.db partition wasn't l

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-16 Thread Milanov, Radoslav Nikiforov
FYI
Having 50GB bock.db made no difference on the performance.

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 6:13 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>
Cc: Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore


I'd probably say 50GB to leave some extra space over-provisioned.  50GB should 
definitely prevent any DB operations from spilling over to the HDD.

On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Thank you,
It is 4TB OSDs and they might become full someday, I’ll try 60GB db partition – 
this is the max OSD capacity.

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 5:38 PM

To: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>
Cc: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

You have to configure the size of the db partition in the config file for the 
cluster.  If you're db partition is 1GB, then I can all but guarantee that 
you're using your HDD for your blocks.db very quickly into your testing.  There 
have been multiple threads recently about what size the db partition should be 
and it seems to be based on how many objects your OSD is likely to have on it.  
The recommendation has been to err on the side of bigger.  If you're running 
10TB OSDs and anticipate filling them up, then you probably want closer to an 
80GB+ db partition.  That's why I asked how full your cluster was and how large 
your HDDs are.

Here's a link to one of the recent ML threads on this topic.  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have cli

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Thank you,
It is 4TB OSDs and they might become full someday, I’ll try 60GB db partition – 
this is the max OSD capacity.

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 5:38 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>
Cc: Mark Nelson <mnel...@redhat.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

You have to configure the size of the db partition in the config file for the 
cluster.  If you're db partition is 1GB, then I can all but guarantee that 
you're using your HDD for your blocks.db very quickly into your testing.  There 
have been multiple threads recently about what size the db partition should be 
and it seems to be based on how many objects your OSD is likely to have on it.  
The recommendation has been to err on the side of bigger.  If you're running 
10TB OSDs and anticipate filling them up, then you probably want closer to an 
80GB+ db partition.  That's why I asked how full your cluster was and how large 
your HDDs are.

Here's a link to one of the recent ML threads on this topic.  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020822.html
On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov 
<rad...@bu.edu<mailto:rad...@bu.edu>> wrote:
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com<mailto:mnel...@redhat.com>>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu<mailto:rad...@bu.edu>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?
>
> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.
>
> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.
>
> Mark
>
> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>> Hi
>>
>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>>

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Block-db partition is the default 1GB (is there a way to modify this? journals 
are 5GB in filestore case) and usage is low:

[root@kumo-ceph02 ~]# ceph df
GLOBAL:
SIZEAVAIL  RAW USED %RAW USED
100602G 99146G1455G  1.45
POOLS:
NAME  ID USED   %USED MAX AVAIL OBJECTS
kumo-vms  1  19757M  0.0231147G5067
kumo-volumes  2214G  0.1831147G   55248
kumo-images   3203G  0.1731147G   66486
kumo-vms3 11 45824M  0.0431147G   11643
kumo-volumes3 13 10837M 031147G2724
kumo-images3  15 82450M  0.0931147G   10320

- Rado

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 4:40 PM
To: Mark Nelson <mnel...@redhat.com>
Cc: Milanov, Radoslav Nikiforov <rad...@bu.edu>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

How big was your blocks.db partition for each OSD and what size are your HDDs?  
Also how full is your cluster?  It's possible that your blocks.db partition 
wasn't large enough to hold the entire db and it had to spill over onto the HDD 
which would definitely impact performance.

On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson 
<mnel...@redhat.com<mailto:mnel...@redhat.com>> wrote:
How big were the writes in the windows test and how much concurrency was
there?

Historically bluestore does pretty well for us with small random writes
so your write results surprise me a bit.  I suspect it's the low queue
depth.  Sometimes bluestore does worse with reads, especially if
readahead isn't enabled on the client.

Mark

On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
> Hi Mark,
> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.
>
> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.
>
> - Rado
>
> -Original Message-
> From: ceph-users 
> [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
>  On Behalf Of Mark Nelson
> Sent: Tuesday, November 14, 2017 4:04 PM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Bluestore performance 50% of filestore
>
> Hi Radoslav,
>
> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?
>
> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.
>
> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.
>
> Mark
>
> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>> Hi
>>
>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>>
>> In filestore configuration there are 3 SSDs used for journals of 9
>> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
>>
>> I've converted filestore to bluestore by wiping 1 host a time and
>> waiting for recovery. SSDs now contain block-db - again one SSD
>> serving
>> 3 OSDs.
>>
>>
>>
>> Cluster is used as storage for Openstack.
>>
>> Running fio on a VM in that Openstack reveals bluestore performance
>> almost twice slower than filestore.
>>
>> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
>> --numjobs=2 --time_based --runtime=180 --group_reporting
>>
>>
>>
>>
>>
>> Filestore
>>
>>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
>>
>>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
>>
>>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
>>
>>
>>
>>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
>>
>>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
>>
>>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
>>
>>
>>
>> 

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
16 MB block, single thread, sequential writes, this is



[cid:image001.emz@01D35D67.61AF9D30]



- Rado



-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Tuesday, November 14, 2017 4:36 PM
To: Milanov, Radoslav Nikiforov <rad...@bu.edu>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore



How big were the writes in the windows test and how much concurrency was there?



Historically bluestore does pretty well for us with small random writes so your 
write results surprise me a bit.  I suspect it's the low queue depth.  
Sometimes bluestore does worse with reads, especially if readahead isn't 
enabled on the client.



Mark



On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:

> Hi Mark,

> Yes RBD is in write back, and the only thing that changed was converting OSDs 
> to bluestore. It is 7200 rpm drives and triple replication. I also get same 
> results (bluestore 2 times slower) testing continuous writes on a 40GB 
> partition on a Windows VM, completely different tool.

>

> Right now I'm going back to filestore for the OSDs so additional tests are 
> possible if that helps.

>

> - Rado

>

> -Original Message-

> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf

> Of Mark Nelson

> Sent: Tuesday, November 14, 2017 4:04 PM

> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

> Subject: Re: [ceph-users] Bluestore performance 50% of filestore

>

> Hi Radoslav,

>

> Is RBD cache enabled and in writeback mode?  Do you have client side 
> readahead?

>

> Both are doing better for writes than you'd expect from the native 
> performance of the disks assuming they are typical 7200RPM drives and you are 
> using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file 
> size, I'd expect that you might be getting better journal coalescing in 
> filestore.

>

> Sadly I imagine you can't do a comparison test at this point, but I'd be 
> curious how it would look if you used libaio with a high iodepth and a much 
> bigger partition to do random writes over.

>

> Mark

>

> On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:

>> Hi

>>

>> We have 3 node, 27 OSDs cluster running Luminous 12.2.1

>>

>> In filestore configuration there are 3 SSDs used for journals of 9

>> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).

>>

>> I've converted filestore to bluestore by wiping 1 host a time and

>> waiting for recovery. SSDs now contain block-db - again one SSD

>> serving

>> 3 OSDs.

>>

>>

>>

>> Cluster is used as storage for Openstack.

>>

>> Running fio on a VM in that Openstack reveals bluestore performance

>> almost twice slower than filestore.

>>

>> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G

>> --numjobs=2 --time_based --runtime=180 --group_reporting

>>

>> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G

>> --numjobs=2 --time_based --runtime=180 --group_reporting

>>

>>

>>

>>

>>

>> Filestore

>>

>>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

>>

>>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

>>

>>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec

>>

>>

>>

>>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

>>

>>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

>>

>>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec

>>

>>

>>

>> Bluestore

>>

>>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

>>

>>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

>>

>>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec

>>

>>

>>

>>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

>>

>>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

>>

>>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec

>>

>>

>>

>>

>>

>> - Rado

>>

>>

>>

>>

>>

>> ___

>> ceph-users mailing list

>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

> ___

> ceph-users mailing list

> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>


image001.emz
Description: image001.emz
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Hi Mark,
Yes RBD is in write back, and the only thing that changed was converting OSDs 
to bluestore. It is 7200 rpm drives and triple replication. I also get same 
results (bluestore 2 times slower) testing continuous writes on a 40GB 
partition on a Windows VM, completely different tool. 

Right now I'm going back to filestore for the OSDs so additional tests are 
possible if that helps.

- Rado

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Tuesday, November 14, 2017 4:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore

Hi Radoslav,

Is RBD cache enabled and in writeback mode?  Do you have client side readahead?

Both are doing better for writes than you'd expect from the native performance 
of the disks assuming they are typical 7200RPM drives and you are using 3X 
replication (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file size, I'd 
expect that you might be getting better journal coalescing in filestore.

Sadly I imagine you can't do a comparison test at this point, but I'd be 
curious how it would look if you used libaio with a high iodepth and a much 
bigger partition to do random writes over.

Mark

On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
> Hi
>
> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>
> In filestore configuration there are 3 SSDs used for journals of 9 
> OSDs on each hosts (1 SSD has 3 journal paritions for 3 OSDs).
>
> I've converted filestore to bluestore by wiping 1 host a time and 
> waiting for recovery. SSDs now contain block-db - again one SSD 
> serving
> 3 OSDs.
>
>
>
> Cluster is used as storage for Openstack.
>
> Running fio on a VM in that Openstack reveals bluestore performance 
> almost twice slower than filestore.
>
> fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G
> --numjobs=2 --time_based --runtime=180 --group_reporting
>
> fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G
> --numjobs=2 --time_based --runtime=180 --group_reporting
>
>
>
>
>
> Filestore
>
>   write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec
>
>   write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec
>
>   write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec
>
>
>
>   read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec
>
>   read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec
>
>   read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec
>
>
>
> Bluestore
>
>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec
>
>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec
>
>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec
>
>
>
>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec
>
>   read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec
>
>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec
>
>
>
>
>
> - Rado
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore performance 50% of filestore

2017-11-14 Thread Milanov, Radoslav Nikiforov
Hi
We have 3 node, 27 OSDs cluster running Luminous 12.2.1
In filestore configuration there are 3 SSDs used for journals of 9 OSDs on each 
hosts (1 SSD has 3 journal paritions for 3 OSDs).
I've converted filestore to bluestore by wiping 1 host a time and waiting for 
recovery. SSDs now contain block-db - again one SSD serving 3 OSDs.

Cluster is used as storage for Openstack.
Running fio on a VM in that Openstack reveals bluestore performance almost 
twice slower than filestore.
fio --name fio_test_file --direct=1 --rw=randwrite --bs=4k --size=1G 
--numjobs=2 --time_based --runtime=180 --group_reporting
fio --name fio_test_file --direct=1 --rw=randread --bs=4k --size=1G --numjobs=2 
--time_based --runtime=180 --group_reporting



Filestore

  write: io=3511.9MB, bw=19978KB/s, iops=4994, runt=180001msec

  write: io=3525.6MB, bw=20057KB/s, iops=5014, runt=180001msec

  write: io=3554.1MB, bw=20222KB/s, iops=5055, runt=180016msec



  read : io=1995.7MB, bw=11353KB/s, iops=2838, runt=180001msec

  read : io=1824.5MB, bw=10379KB/s, iops=2594, runt=180001msec

  read : io=1966.5MB, bw=11187KB/s, iops=2796, runt=180001msec



Bluestore

  write: io=1621.2MB, bw=9222.3KB/s, iops=2305, runt=180002msec

  write: io=1576.3MB, bw=8965.6KB/s, iops=2241, runt=180029msec

  write: io=1531.9MB, bw=8714.3KB/s, iops=2178, runt=180001msec



  read : io=1279.4MB, bw=7276.5KB/s, iops=1819, runt=180006msec

  read : io=773824KB, bw=4298.9KB/s, iops=1074, runt=180010msec

  read : io=1018.5MB, bw=5793.7KB/s, iops=1448, runt=180001msec


- Rado

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com