Re: [ceph-users] speed decrease with size

2017-03-14 Thread Ben Erridge
I did find the journal configuration entries and they indeed did help for
this test, thanks
Configuration was:
journal_max_write_entries=100
journal_queue_max_ops=300
journal_queue_max_bytes=33554432
journal_max_write_bytes=10485760

Configuration after update:
journal_max_write_entries=1
journal_queue_max_ops=5
journal_queue_max_bytes=1048576
journal_max_write_bytes=1073714824


Before changes:
dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
/mnt/ext4/output;
5120+0 records in
5120+0 records out
524288 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s

After changes:
dd if=/dev/zero of=/mnt/ceph-block-device/output bs=1000k count=5k; rm -f
/mnt/ceph-block-device/output;
5120+0 records in
5120+0 records out
524288 bytes (5.2 GB) copied, 3.20913 s, 1.6 GB/s

I still need to validate that this is better for our workload.
Thanks for your help.


On Mon, Mar 13, 2017 at 7:24 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Mon, 13 Mar 2017 11:25:15 -0400 Ben Erridge wrote:
>
> > On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer  wrote:
> >
> > >
> > > Hello,
> > >
> > > On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:
> > >
> > > > I am testing attached volume storage on our openstack cluster which
> uses
> > > > ceph for block storage.
> > > > our Ceph nodes have large SSD's for their journals 50+GB for each
> OSD.
> > > I'm
> > > > thinking some parameter is a little off because with relatively small
> > > > writes I am seeing drastically reduced write speeds.
> > > >
> > > Large journals are a waste for most people, especially when your
> backing
> > > storage are HDDs.
> > >
> > > >
> > > > we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> > > >
> > > I hope that's not your plan for production, with a replica of 2 you're
> > > looking at pretty much guaranteed data loss over time, unless your OSDs
> > > are actually RAIDs.
> > >
> > > I am aware that replica of 3 is suggested thanks.
> >
> >
> > > 5GB journals tend to be overkill already.
> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2016-March/008606.html
> > >
> > > If you were to actually look at your OSD nodes during those tests with
> > > something like atop or "iostat -x", you'd likely see that with
> prolonged
> > > writes you wind up with the speed of what your HDDs can do, i.e. see
> them
> > > (all or individually) being quite busy.
> > >
> >
> > That is what I was thinking as well which is not what I want. I want to
> > better utilize these large SSD journals. If I have 50GB journal
> > and I only want to write 5GB of data I should be able to get near SSD
> speed
> > for this operation. Why am I not?
> See the thread above and
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010754.html
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/038669.html
>
> > Maybe I should increase
> > *filestore_max_sync_interval.*
> >
> That is your least worry, even though it seems to be the first parameter
> to change.
> Use your google foo to find some really old threads about this.
>
> The journal* parameters are what you want to look at, see the threads
> above. And AFAIK Ceph will flush the journal at 50% full, no matter what.
>
> And at the end you will likely find that using your 50GB journals in full
> will be difficult and doing so w/o getting a very uneven performance
> nearly impossible.
>
> Christian
> >
> > >
> > > Lastly, for nearly everybody in real life situations the
> > > bandwidth/throughput becomes a distant second to latency
> considerations.
> > >
> >
> > Thanks for the advice however.
> >
> >
> > > Christian
> > >
> > > >
> > > >  here is our Ceph config
> > > >
> > > > [global]
> > > > fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> > > > mon_initial_members = node-5 node-4 node-3
> > > > mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> > > > auth_cluster_required = cephx
> > > > auth_service_required = cephx
> > > > auth_client_required = cephx
> > > > filestore_xattr_use_omap = true
> > > > log_to_syslog_level = info
> > > > log_to_syslog = True
> > > > osd_pool_default_size = 1
> > > > osd_pool_default_min_size = 1
> > > > osd_pool_default_pg_num = 64
> > > > public_network = 192.168.0.0/24
> > > > log_to_syslog_facility = LOG_LOCAL0
> > > > osd_journal_size = 5
> > > > auth_supported = cephx
> > > > osd_pool_default_pgp_num = 64
> > > > osd_mkfs_type = xfs
> > > > cluster_network = 192.168.1.0/24
> > > > osd_recovery_max_active = 1
> > > > osd_max_backfills = 1
> > > >
> > > > [client]
> > > > rbd_cache = True
> > > > rbd_cache_writethrough_until_flush = True
> > > >
> > > > [client.radosgw.gateway]
> > > > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> > > > keyring = /etc/ceph/keyring.radosgw.gateway
> > > > rgw_socket_path = /tmp/radosgw.sock
> > > > rgw_keystone_revocation_interval = 100
> > > > rgw_keystone_url = 192.168.0.2:35357
> > > > rgw_keystone_admin_token = ZBz37Vlv
> > > > host = node-3
> > > > rgw_d

Re: [ceph-users] speed decrease with size

2017-03-13 Thread Christian Balzer

Hello,

On Mon, 13 Mar 2017 11:25:15 -0400 Ben Erridge wrote:

> On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer  wrote:
> 
> >
> > Hello,
> >
> > On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:
> >  
> > > I am testing attached volume storage on our openstack cluster which uses
> > > ceph for block storage.
> > > our Ceph nodes have large SSD's for their journals 50+GB for each OSD.  
> > I'm  
> > > thinking some parameter is a little off because with relatively small
> > > writes I am seeing drastically reduced write speeds.
> > >  
> > Large journals are a waste for most people, especially when your backing
> > storage are HDDs.
> >  
> > >
> > > we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> > >  
> > I hope that's not your plan for production, with a replica of 2 you're
> > looking at pretty much guaranteed data loss over time, unless your OSDs
> > are actually RAIDs.
> >
> > I am aware that replica of 3 is suggested thanks.  
> 
> 
> > 5GB journals tend to be overkill already.
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008606.html
> >
> > If you were to actually look at your OSD nodes during those tests with
> > something like atop or "iostat -x", you'd likely see that with prolonged
> > writes you wind up with the speed of what your HDDs can do, i.e. see them
> > (all or individually) being quite busy.
> >  
> 
> That is what I was thinking as well which is not what I want. I want to
> better utilize these large SSD journals. If I have 50GB journal
> and I only want to write 5GB of data I should be able to get near SSD speed
> for this operation. Why am I not? 
See the thread above and
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010754.html

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/038669.html

> Maybe I should increase
> *filestore_max_sync_interval.*
> 
That is your least worry, even though it seems to be the first parameter
to change.
Use your google foo to find some really old threads about this.

The journal* parameters are what you want to look at, see the threads
above. And AFAIK Ceph will flush the journal at 50% full, no matter what.

And at the end you will likely find that using your 50GB journals in full
will be difficult and doing so w/o getting a very uneven performance
nearly impossible.

Christian
> 
> >
> > Lastly, for nearly everybody in real life situations the
> > bandwidth/throughput becomes a distant second to latency considerations.
> >  
> 
> Thanks for the advice however.
> 
> 
> > Christian
> >  
> > >
> > >  here is our Ceph config
> > >
> > > [global]
> > > fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> > > mon_initial_members = node-5 node-4 node-3
> > > mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> > > auth_cluster_required = cephx
> > > auth_service_required = cephx
> > > auth_client_required = cephx
> > > filestore_xattr_use_omap = true
> > > log_to_syslog_level = info
> > > log_to_syslog = True
> > > osd_pool_default_size = 1
> > > osd_pool_default_min_size = 1
> > > osd_pool_default_pg_num = 64
> > > public_network = 192.168.0.0/24
> > > log_to_syslog_facility = LOG_LOCAL0
> > > osd_journal_size = 5
> > > auth_supported = cephx
> > > osd_pool_default_pgp_num = 64
> > > osd_mkfs_type = xfs
> > > cluster_network = 192.168.1.0/24
> > > osd_recovery_max_active = 1
> > > osd_max_backfills = 1
> > >
> > > [client]
> > > rbd_cache = True
> > > rbd_cache_writethrough_until_flush = True
> > >
> > > [client.radosgw.gateway]
> > > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> > > keyring = /etc/ceph/keyring.radosgw.gateway
> > > rgw_socket_path = /tmp/radosgw.sock
> > > rgw_keystone_revocation_interval = 100
> > > rgw_keystone_url = 192.168.0.2:35357
> > > rgw_keystone_admin_token = ZBz37Vlv
> > > host = node-3
> > > rgw_dns_name = *.ciminc.com
> > > rgw_print_continue = True
> > > rgw_keystone_token_cache_size = 10
> > > rgw_data = /var/lib/ceph/radosgw
> > > user = www-data
> > >
> > > This is the degradation I am speaking of..
> > >
> > >
> > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
> > > /mnt/ext4/output;
> > > 1024+0 records in
> > > 1024+0 records out
> > > 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s
> > >
> > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
> > > /mnt/ext4/output;
> > > 2048+0 records in
> > > 2048+0 records out
> > > 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s
> > >
> > >  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
> > > /mnt/ext4/output;
> > > 3072+0 records in
> > > 3072+0 records out
> > > 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s
> > >
> > > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
> > > /mnt/ext4/output;
> > > 5120+0 records in
> > > 5120+0 records out
> > > 524288 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s
> > >
> > > Any suggestions for improving the large write degradation?  
> >
> >
> > --
> > Christian Bal

Re: [ceph-users] speed decrease with size

2017-03-13 Thread Ben Erridge
On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:
>
> > I am testing attached volume storage on our openstack cluster which uses
> > ceph for block storage.
> > our Ceph nodes have large SSD's for their journals 50+GB for each OSD.
> I'm
> > thinking some parameter is a little off because with relatively small
> > writes I am seeing drastically reduced write speeds.
> >
> Large journals are a waste for most people, especially when your backing
> storage are HDDs.
>
> >
> > we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> >
> I hope that's not your plan for production, with a replica of 2 you're
> looking at pretty much guaranteed data loss over time, unless your OSDs
> are actually RAIDs.
>
> I am aware that replica of 3 is suggested thanks.


> 5GB journals tend to be overkill already.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008606.html
>
> If you were to actually look at your OSD nodes during those tests with
> something like atop or "iostat -x", you'd likely see that with prolonged
> writes you wind up with the speed of what your HDDs can do, i.e. see them
> (all or individually) being quite busy.
>

That is what I was thinking as well which is not what I want. I want to
better utilize these large SSD journals. If I have 50GB journal
and I only want to write 5GB of data I should be able to get near SSD speed
for this operation. Why am I not? Maybe I should increase
*filestore_max_sync_interval.*


>
> Lastly, for nearly everybody in real life situations the
> bandwidth/throughput becomes a distant second to latency considerations.
>

Thanks for the advice however.


> Christian
>
> >
> >  here is our Ceph config
> >
> > [global]
> > fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> > mon_initial_members = node-5 node-4 node-3
> > mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> > auth_cluster_required = cephx
> > auth_service_required = cephx
> > auth_client_required = cephx
> > filestore_xattr_use_omap = true
> > log_to_syslog_level = info
> > log_to_syslog = True
> > osd_pool_default_size = 1
> > osd_pool_default_min_size = 1
> > osd_pool_default_pg_num = 64
> > public_network = 192.168.0.0/24
> > log_to_syslog_facility = LOG_LOCAL0
> > osd_journal_size = 5
> > auth_supported = cephx
> > osd_pool_default_pgp_num = 64
> > osd_mkfs_type = xfs
> > cluster_network = 192.168.1.0/24
> > osd_recovery_max_active = 1
> > osd_max_backfills = 1
> >
> > [client]
> > rbd_cache = True
> > rbd_cache_writethrough_until_flush = True
> >
> > [client.radosgw.gateway]
> > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> > keyring = /etc/ceph/keyring.radosgw.gateway
> > rgw_socket_path = /tmp/radosgw.sock
> > rgw_keystone_revocation_interval = 100
> > rgw_keystone_url = 192.168.0.2:35357
> > rgw_keystone_admin_token = ZBz37Vlv
> > host = node-3
> > rgw_dns_name = *.ciminc.com
> > rgw_print_continue = True
> > rgw_keystone_token_cache_size = 10
> > rgw_data = /var/lib/ceph/radosgw
> > user = www-data
> >
> > This is the degradation I am speaking of..
> >
> >
> > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
> > /mnt/ext4/output;
> > 1024+0 records in
> > 1024+0 records out
> > 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s
> >
> > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
> > /mnt/ext4/output;
> > 2048+0 records in
> > 2048+0 records out
> > 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s
> >
> >  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
> > /mnt/ext4/output;
> > 3072+0 records in
> > 3072+0 records out
> > 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s
> >
> > dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
> > /mnt/ext4/output;
> > 5120+0 records in
> > 5120+0 records out
> > 524288 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s
> >
> > Any suggestions for improving the large write degradation?
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>



-- 
-.
Ben Erridge
Center For Information Management, Inc.
(734) 930-0855
3550 West Liberty Road Ste 1
Ann Arbor, MI 48103
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] speed decrease with size

2017-03-12 Thread Christian Balzer

Hello,

On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:

> I am testing attached volume storage on our openstack cluster which uses
> ceph for block storage.
> our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm
> thinking some parameter is a little off because with relatively small
> writes I am seeing drastically reduced write speeds.
> 
Large journals are a waste for most people, especially when your backing
storage are HDDs.

> 
> we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> 
I hope that's not your plan for production, with a replica of 2 you're
looking at pretty much guaranteed data loss over time, unless your OSDs
are actually RAIDs.

5GB journals tend to be overkill already.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008606.html

If you were to actually look at your OSD nodes during those tests with
something like atop or "iostat -x", you'd likely see that with prolonged
writes you wind up with the speed of what your HDDs can do, i.e. see them
(all or individually) being quite busy.

Lastly, for nearly everybody in real life situations the
bandwidth/throughput becomes a distant second to latency considerations. 

Christian

> 
>  here is our Ceph config
> 
> [global]
> fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> mon_initial_members = node-5 node-4 node-3
> mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 1
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 192.168.0.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 5
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 192.168.1.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> 
> [client]
> rbd_cache = True
> rbd_cache_writethrough_until_flush = True
> 
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw_socket_path = /tmp/radosgw.sock
> rgw_keystone_revocation_interval = 100
> rgw_keystone_url = 192.168.0.2:35357
> rgw_keystone_admin_token = ZBz37Vlv
> host = node-3
> rgw_dns_name = *.ciminc.com
> rgw_print_continue = True
> rgw_keystone_token_cache_size = 10
> rgw_data = /var/lib/ceph/radosgw
> user = www-data
> 
> This is the degradation I am speaking of..
> 
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
> /mnt/ext4/output;
> 1024+0 records in
> 1024+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
> /mnt/ext4/output;
> 2048+0 records in
> 2048+0 records out
> 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s
> 
>  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
> /mnt/ext4/output;
> 3072+0 records in
> 3072+0 records out
> 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
> /mnt/ext4/output;
> 5120+0 records in
> 5120+0 records out
> 524288 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s
> 
> Any suggestions for improving the large write degradation?


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] speed decrease with size

2017-03-12 Thread Ben Erridge
I am testing attached volume storage on our openstack cluster which uses
ceph for block storage.
our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm
thinking some parameter is a little off because with relatively small
writes I am seeing drastically reduced write speeds.


we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.


 here is our Ceph config

[global]
fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
mon_initial_members = node-5 node-4 node-3
mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
log_to_syslog_level = info
log_to_syslog = True
osd_pool_default_size = 1
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 64
public_network = 192.168.0.0/24
log_to_syslog_facility = LOG_LOCAL0
osd_journal_size = 5
auth_supported = cephx
osd_pool_default_pgp_num = 64
osd_mkfs_type = xfs
cluster_network = 192.168.1.0/24
osd_recovery_max_active = 1
osd_max_backfills = 1

[client]
rbd_cache = True
rbd_cache_writethrough_until_flush = True

[client.radosgw.gateway]
rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
rgw_keystone_revocation_interval = 100
rgw_keystone_url = 192.168.0.2:35357
rgw_keystone_admin_token = ZBz37Vlv
host = node-3
rgw_dns_name = *.ciminc.com
rgw_print_continue = True
rgw_keystone_token_cache_size = 10
rgw_data = /var/lib/ceph/radosgw
user = www-data

This is the degradation I am speaking of..


dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
/mnt/ext4/output;
1024+0 records in
1024+0 records out
1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s

dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
/mnt/ext4/output;
2048+0 records in
2048+0 records out
2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s

 dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
/mnt/ext4/output;
3072+0 records in
3072+0 records out
3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s

dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
/mnt/ext4/output;
5120+0 records in
5120+0 records out
524288 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s

Any suggestions for improving the large write degradation?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com