Re: [ceph-users] RBD journaling benchmarks

2017-07-13 Thread Jason Dillaman
On Thu, Jul 13, 2017 at 10:58 AM, Maged Mokhtar  wrote:
> The case also applies to active/passive iSCSI.. you still have many
> initiators/hypervisors writing concurrently to the same rbd image using a
> clustered file system (csv/vmfs).

Except from that point-of-view, there is only a single RBD client ---
the active iSCSI target.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-13 Thread Maged Mokhtar


--
From: "Jason Dillaman" <jdill...@redhat.com>
Sent: Thursday, July 13, 2017 4:45 AM
To: "Maged Mokhtar" <mmokh...@petasan.org>
Cc: "Mohamad Gebai" <mge...@suse.com>; "ceph-users" <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] RBD journaling benchmarks

> On Mon, Jul 10, 2017 at 3:41 PM, Maged Mokhtar <mmokh...@petasan.org> wrote:
>> On 2017-07-10 20:06, Mohamad Gebai wrote:
>>
>>
>> On 07/10/2017 01:51 PM, Jason Dillaman wrote:
>>
>> On Mon, Jul 10, 2017 at 1:39 PM, Maged Mokhtar <mmokh...@petasan.org> wrote:
>>
>> These are significant differences, to the point where it may not make sense
>> to use rbd journaling / mirroring unless there is only 1 active client.
>>
>> I interpreted the results as the same RBD image was being concurrently
>> used by two fio jobs -- which we strongly recommend against since it
>> will result in the exclusive-lock ping-ponging back and forth between
>> the two clients / jobs. Each fio RBD job should utilize its own
>> backing image to avoid such a scenario.
>>
>>
>> That is correct. The single job runs are more representative of the
>> overhead of journaling only, and it is worth noting the (expected)
>> inefficiency of multiple clients for the same RBD image, as explained by
>> Jason.
>>
>> Mohamad
>>
>> Yes i expected a penalty but not as large. There are some use cases that
>> would benefit from concurrent access to the same block device, in vmware ad
>> hyper-v several hypervisors could share the same device which is formatted
>> via a clustered file system like MS CSV ( clustered shared volumes ) or
>> VMFS, which creates a volume/datastore that houses many VMs.
> 
> Both of these use-cases would first need support for active/active
> iSCSI. While A/A iSCSI via MPIO is trivial to enable, getting it to
> properly handle failure conditions without the possibility of data
> corruption is not since it relies heavily on arbitrary initiator and
> target-based timers. The only realistic and safe solution is to rely
> on an MCS-based active/active implementation.

The case also applies to active/passive iSCSI.. you still have many 
initiators/hypervisors writing concurrently to the same rbd image using a 
clustered file system (csv/vmfs).

>> I was wondering if such a setup could be supported in the future and maybe
>> there could be a way to minimize the overhead of the exclusive lock..for
>> example by having a distributed sequence number to the different active
>> client writers and have each writer maintain its own journal, i doubt that
>> the overhead will reach the values you showed.
> 
> The journal used by the librbd mirroring feature was designed to
> support multiple concurrent writers. Of course, that original design
> was more inline with the goal of supporting multiple images within a
> consistency group.

Yes but they will still suffer performance penalty , my understanding is that 
they would need the lock while writing the data to the journal entries and thus 
will be waiting turns, or  do they need the lock only for journal metadata like 
generating a sequence number ? 

>> Maged
>>
>>
> 
> -- 
> Jason___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-12 Thread Jason Dillaman
On Mon, Jul 10, 2017 at 3:41 PM, Maged Mokhtar  wrote:
> On 2017-07-10 20:06, Mohamad Gebai wrote:
>
>
> On 07/10/2017 01:51 PM, Jason Dillaman wrote:
>
> On Mon, Jul 10, 2017 at 1:39 PM, Maged Mokhtar  wrote:
>
> These are significant differences, to the point where it may not make sense
> to use rbd journaling / mirroring unless there is only 1 active client.
>
> I interpreted the results as the same RBD image was being concurrently
> used by two fio jobs -- which we strongly recommend against since it
> will result in the exclusive-lock ping-ponging back and forth between
> the two clients / jobs. Each fio RBD job should utilize its own
> backing image to avoid such a scenario.
>
>
> That is correct. The single job runs are more representative of the
> overhead of journaling only, and it is worth noting the (expected)
> inefficiency of multiple clients for the same RBD image, as explained by
> Jason.
>
> Mohamad
>
> Yes i expected a penalty but not as large. There are some use cases that
> would benefit from concurrent access to the same block device, in vmware ad
> hyper-v several hypervisors could share the same device which is formatted
> via a clustered file system like MS CSV ( clustered shared volumes ) or
> VMFS, which creates a volume/datastore that houses many VMs.

Both of these use-cases would first need support for active/active
iSCSI. While A/A iSCSI via MPIO is trivial to enable, getting it to
properly handle failure conditions without the possibility of data
corruption is not since it relies heavily on arbitrary initiator and
target-based timers. The only realistic and safe solution is to rely
on an MCS-based active/active implementation.

> I was wondering if such a setup could be supported in the future and maybe
> there could be a way to minimize the overhead of the exclusive lock..for
> example by having a distributed sequence number to the different active
> client writers and have each writer maintain its own journal, i doubt that
> the overhead will reach the values you showed.

The journal used by the librbd mirroring feature was designed to
support multiple concurrent writers. Of course, that original design
was more inline with the goal of supporting multiple images within a
consistency group.

> Maged
>
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Maged Mokhtar
On 2017-07-10 20:06, Mohamad Gebai wrote:

> On 07/10/2017 01:51 PM, Jason Dillaman wrote: On Mon, Jul 10, 2017 at 1:39 
> PM, Maged Mokhtar  wrote: These are significant 
> differences, to the point where it may not make sense
> to use rbd journaling / mirroring unless there is only 1 active client. I 
> interpreted the results as the same RBD image was being concurrently
> used by two fio jobs -- which we strongly recommend against since it
> will result in the exclusive-lock ping-ponging back and forth between
> the two clients / jobs. Each fio RBD job should utilize its own
> backing image to avoid such a scenario.

That is correct. The single job runs are more representative of the
overhead of journaling only, and it is worth noting the (expected)
inefficiency of multiple clients for the same RBD image, as explained by
Jason.

Mohamad

Yes i expected a penalty but not as large. There are some use cases that
would benefit from concurrent access to the same block device, in vmware
ad hyper-v several hypervisors could share the same device which is
formatted via a clustered file system like MS CSV ( clustered shared
volumes ) or VMFS, which creates a volume/datastore that houses many
VMs. 

I was wondering if such a setup could be supported in the future and
maybe there could be a way to minimize the overhead of the exclusive
lock..for example by having a distributed sequence number to the
different active client writers and have each writer maintain its own
journal, i doubt that the overhead will reach the values you showed. 

Maged___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Mohamad Gebai

On 07/10/2017 01:51 PM, Jason Dillaman wrote:
> On Mon, Jul 10, 2017 at 1:39 PM, Maged Mokhtar  wrote:
>> These are significant differences, to the point where it may not make sense
>> to use rbd journaling / mirroring unless there is only 1 active client.
> I interpreted the results as the same RBD image was being concurrently
> used by two fio jobs -- which we strongly recommend against since it
> will result in the exclusive-lock ping-ponging back and forth between
> the two clients / jobs. Each fio RBD job should utilize its own
> backing image to avoid such a scenario.
>

That is correct. The single job runs are more representative of the
overhead of journaling only, and it is worth noting the (expected)
inefficiency of multiple clients for the same RBD image, as explained by
Jason.

Mohamad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Jason Dillaman
On Mon, Jul 10, 2017 at 1:39 PM, Maged Mokhtar  wrote:
> These are significant differences, to the point where it may not make sense
> to use rbd journaling / mirroring unless there is only 1 active client.

I interpreted the results as the same RBD image was being concurrently
used by two fio jobs -- which we strongly recommend against since it
will result in the exclusive-lock ping-ponging back and forth between
the two clients / jobs. Each fio RBD job should utilize its own
backing image to avoid such a scenario.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Maged Mokhtar
On 2017-07-10 18:14, Mohamad Gebai wrote:

> Resending as my first try seems to have disappeared.
> 
> Hi,
> 
> We ran some benchmarks to assess the overhead caused by enabling
> client-side RBD journaling in Luminous. The tests consists of:
> - Create an image with journaling enabled  (--image-feature journaling)
> - Run randread, randwrite and randrw workloads sequentially from a
> single client using fio
> - Collect IOPS
> 
> More info:
> - Feature exclusive-lock is enabled with journaling (required)
> - Queue depth of 128 for fio
> - With 1 and 2 threads
> 
> Cluster 1
> 
> 
> - 5 OSD nodes
> - 6 OSDs per node
> - 3 monitors
> - All SSD
> - Bluestore + WAL
> - 10GbE NIC
> - Ceph version 12.0.3-1380-g6984d41b5d
> (6984d41b5d142ce157216b6e757bcb547da2c7d2) luminous (dev)
> 
> Results:
> 
> DefaultJournalingJour width 32  
> JobsIOPSIOPSSlowdownIOPSSlowdown
> RW
> 1195219104   2.1x160671.2x
> 230575726   42.1x  48862.6x
> Read
> 12277522946  0.9x236010.9x
> 2359551078  33.3x  44680.2x
> Write
> 1185156054   3.0x 97651.9x
> 2295861188  24.9x  53455.4x
> 
> - "Default" is the baseline (with journaling disabled)
> - "Journaling" is with journaling enabled
> - "Jour width 32" is with a journal data width of 32 objects
> (--journal-splay-width 32)
> - The major slowdown for two jobs is due to locking
> - With a journal width of 32, the 0.9x slowdown (which is actually a
> speedup) is due to the read-only workload, which doesn't exercise the
> journaling code.
> - The randwrite workload exercises the journaling code the most, and is
> expected to have the highest slowdown, which is 1.9x in this case.
> 
> Cluster 2
> 
> 
> - 3 OSD nodes
> - 10 OSDs per node
> - 1 monitor
> - All HDD
> - Filestore
> - 10GbE NIC
> - Ceph version 12.1.0-289-g117b171715
> (117b1717154e1236b2d37c405a86a9444cf7871d) luminous (dev)
> 
> Results:
> 
> DefaultJournaling Jour width 32  
> Jobs  IOPSIOPS Slowdown  IOPS   Slowdown
> RW  
> 11186936743.2x   4914  2.4x
> 213127 736   17.8x432 30.4x
> Read  
> 114500   147001.0x  14703  1.0x
> 21667338934.3x307 54.3x
> Write  
> 1 826719254.3x   2591  3.2x
> 2 828310128.2x417 19.9x
> 
> - The number of IOPS for the write workload is quite low, which is due
> to HDDs and filestore
> 
> Mohamad
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

These are significant differences, to the point where it may not make
sense to use rbd journaling / mirroring unless there is only 1 active
client. Could there be in the future enhancement that will try to make
active/active possible ? Would it help if each active writer maintained
their own queue and only lock for a sequence number / counter to try to
minimize the lock overhead writing in the same journal queue  ?

Maged___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com