Re: [ceph-users] cephfs (rbd) read performance low - where is the bottleneck?

2016-11-22 Thread Mike Miller

Hi,

did some testing multithreaded access and dd, performance scales as it 
should.


Any ideas to improve single threaded read performance further would be 
highly appreciated. Some of our use cases requires that we need to read 
large files by a single thread.


I have tried changing the readahead on the kernel client cephfs mount 
too, rsize and rasize.


mount.ceph ... -o name=cephfs,secretfile=secret.key,rsize=67108864

Doing this on kernel 4.5.2 gives the error message:
"ceph: Unknown mount option rsize"
or unknown rasize.

Can someone explain to me how I can experiment with readahead on cephfs?

Mike

On 11/21/16 12:33 PM, Eric Eastman wrote:

Have you looked at your file layout?

On a test cluster running 10.2.3 I created a 5GB file and then looked
at the layout:

# ls -l test.dat
  -rw-r--r-- 1 root root 524288 Nov 20 23:09 test.dat
# getfattr -n ceph.file.layout test.dat
  # file: test.dat
  ceph.file.layout="stripe_unit=4194304 stripe_count=1
object_size=4194304 pool=cephfs_data"

From what I understand with this layout you are reading 4MB of data
from 1 OSD at a time so I think you are seeing the overall speed of a
single SATA drive.  I do not think increasing your MON/MDS links to
10Gb will help, nor for a single file read will it help by going to
SSD for the metadata.

To test this, you may want to try creating 10 x 50GB files, and then
read them in parallel and see if your overall throughput increases.
If so, take a look at the layout parameters and see if you can change
the file layout to get more parallelization.

https://github.com/ceph/ceph/blob/master/doc/dev/file-striping.rst
https://github.com/ceph/ceph/blob/master/doc/cephfs/file-layouts.rst

Regards,
Eric

On Sun, Nov 20, 2016 at 3:24 AM, Mike Miller  wrote:

Hi,

reading a big file 50 GB (tried more too)

dd if=bigfile of=/dev/zero bs=4M

in a cluster with 112 SATA disks in 10 osd (6272 pgs, replication 3) gives
me only about *122 MB/s* read speed in single thread. Scrubbing turned off
during measurement.

I have been searching for possible bottlenecks. The network is not the
problem, the machine running dd is connected to the cluster public network
with a 20 GBASE-T bond. osd dual network: cluster public 10 GBASE-T, private
10 GBASE-T.

The osd SATA disks are utilized only up until about 10% or 20%, not more
than that. CPUs on osd idle too. CPUs on mon idle, mds usage about 1.0 (1
core is used on this 6-core machine). mon and mds connected with only 1 GbE
(I would expect some latency from that, but no bandwidth issues; in fact
network bandwidth is about 20 Mbit max).

If I read a file with 50 GB, then clear the cache on the reading machine
(but not the osd caches), I get much better reading performance of about
*620 MB/s*. That seems logical to me as much (most) of the data is still in
the osd cache buffers. But still the read performance is not super
considered that the reading machine is connected to the cluster with a 20
Gbit/s bond.

How can I improve? I am not really sure, but from my understanding 2
possible bottlenecks come to mind:

1) 1 GbE connection to mon / mds

Is this the reason why reads are slow and osd disks are not hammered by read
requests and therewith fully utilized?

2) Move metadata to SSD

Currently, cephfs_metadata is on the same pool as the data on the spinning
SATA disks. Is this the bottleneck? Is the move of metadata to SSD a
solution?

Or is it both?

Your experience and insight are highly appreciated.

Thanks,

Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] KVM / Ceph performance problems

2016-11-22 Thread Eneko Lacunza

Hi Michiel,

How are you configuring VM disks on Proxmox? What type (virtio, scsi, 
ide) and what cache setting?



El 23/11/16 a las 07:53, M. Piscaer escribió:

Hi,

I have an little performance problem with KVM and Ceph.

I'm using Proxmox 4.3-10/7230e60f, with KVM version
pve-qemu-kvm_2.7.0-8. Ceph is on version jewel 10.2.3 on both the
cluster as the client (ceph-common).

The systems are connected to the network via an 4x bonding with an total
of 4 Gb/s.

Within an guest,
- when I do an write to I get about 10 MB/s.
- Also when I try to do an write within the guest but then directly to
ceph I get the same speed.
- But when I mount an ceph object on the Proxmox host I get about 110MB/s

The guest is connected to interface vmbr160 → bond0.160 → bond0.

This bridge vmbr160 has an IP address with the same subnet as the ceph
cluster with an mtu 9000.

The KVM block device is an virtio device.

What can I do to solve this problem?

Kind regards,

Michiel Piscaer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] KVM / Ceph performance problems

2016-11-22 Thread Andrey Shevel
I am afraid the most probable cause is context switching time related
to your guest (or guests).

On Wed, Nov 23, 2016 at 9:53 AM, M. Piscaer  wrote:
> Hi,
>
> I have an little performance problem with KVM and Ceph.
>
> I'm using Proxmox 4.3-10/7230e60f, with KVM version
> pve-qemu-kvm_2.7.0-8. Ceph is on version jewel 10.2.3 on both the
> cluster as the client (ceph-common).
>
> The systems are connected to the network via an 4x bonding with an total
> of 4 Gb/s.
>
> Within an guest,
> - when I do an write to I get about 10 MB/s.
> - Also when I try to do an write within the guest but then directly to
> ceph I get the same speed.
> - But when I mount an ceph object on the Proxmox host I get about 110MB/s
>
> The guest is connected to interface vmbr160 → bond0.160 → bond0.
>
> This bridge vmbr160 has an IP address with the same subnet as the ceph
> cluster with an mtu 9000.
>
> The KVM block device is an virtio device.
>
> What can I do to solve this problem?
>
> Kind regards,
>
> Michiel Piscaer
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Andrey Y Shevel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-22 Thread Daznis
Hello,


The story goes like this.
I have added another 3 drives to the caching layer. OSDs were added to
crush map one by one after each successful rebalance. When I added the
last OSD and went away for about an hour I noticed that it's still not
finished rebalancing. Further investigation showed me that it one of
the older cache SSD was restarting like crazy before full boot. So I
shut it down and waited for a rebalance without that OSD. Less than an
hour later I had another 2 OSD restarting like crazy. I tried running
scrubs on the PG's logs asked me to, but that did not help. I'm
currently stuck with
" 8 scrub errors" and a complete dead cluster.

log_channel(cluster) log [WRN] : pg 15.8d has invalid (post-split)
stats; must scrub before tier agent can activate


I need help with OSD from crashing. Crash log:
 0> 2016-11-23 06:41:43.365602 7f935b4eb700 -1
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
thread 7f935b4eb700 time 2016-11-23 06:41:43.363067
osd/ReplicatedPG.cc: 10521: FAILED assert(obc)

 ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0xbde2c5]
 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
int)+0x75f) [0x87e89f]
 3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f8bb]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a11aa]
 5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
ThreadPool::TPHandle&)+0x68a) [0x83c37a]
 6: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405)
[0x69af05]
 7: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x333) [0x69b473]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xbcd9cf]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcfb00]
 10: (()+0x7dc5) [0x7f93b9df4dc5]
 11: (clone()+0x6d) [0x7f93b88d5ced]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.


I have tried looking with  full debug enabled, but those logs didn't
help me much. I have tried to evict the cache layer, but some objects
are stuck and can't be removed. Any suggestions would be greatly
appreciated.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] export-diff behavior if an initial snapshot is NOT specified

2016-11-22 Thread Zhongyan Gu
Thanks Jason, very clear explanation.
However, I found some strange behavior when export-diff on a cloned image,
not sure it is a bug on calc_snap_set_diff().
The test is,
Image A is cloned from a parent image. then create snap1 for image A.
The content of export-diff A@snap1 will be changed when update image A.
Only after image A has no overlap with parent, the content of export-diff
A@snap1 is stabled, which is almost zero.
I don't think it is a designed behavior. export-diff A@snap1 should always
get a stable output no matter image A is cloned or not.

Please correct me if anything wrong.

Thanks,
Zhongyan




On Tue, Nov 22, 2016 at 10:31 PM, Jason Dillaman 
wrote:

> On Tue, Nov 22, 2016 at 5:31 AM, Zhongyan Gu 
> wrote:
> > So if initial snapshot is NOT specified, then:
> > rbd export-diff image@snap1 will diff all data to snap1. this cmd
> equals to
> > :
> > rbd export image@snap1. Is my understand right or not??
>
>
> While they will both export all data associated w/ image@snap1, the
> "export" command will generate a raw, non-sparse dump of the full
> image whereas "export-diff" will export only sections of the image
> that contain data. The file generated from "export" can be used with
> the "import" command to create a new image, whereas the file generated
> from "export-diff" can only be used with "import-diff" against an
> existing image.
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] export-diff behavior if an initial snapshot is NOT specified

2016-11-22 Thread Zhongyan Gu
Hi there,
According to the official man page:
http://docs.ceph.com/docs/jewel/man/8/rbd/
export-diff [–from-snap snap-name] [–whole-object] (image-spec | snap-spec)
dest-path
Exports an incremental diff for an image to dest path (use - for stdout).
If an initial snapshot is specified, only changes since that snapshot are
included; otherwise, any regions of the image that contain data are
included.
So if initial snapshot is NOT specified, then:
rbd export-diff image@snap1 will diff all data to snap1. this cmd equals to
:
rbd export image@snap1. Is my understand right or not??

Thanks
Zhongyan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Eugen Block

Thank you!


Zitat von Nick Fisk :


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On  
Behalf Of Eugen Block

Sent: 22 November 2016 10:11
To: Nick Fisk 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] deep-scrubbing has large impact on performance

Thanks for the very quick answer!

> If you are using Jewel

We are still using Hammer (0.94.7), we wanted to upgrade to Jewel  
in a couple of weeks, would you recommend to do it now?


It's been fairly solid for me, but you might want to wait for the  
scrubbing hang bug to be fixed before upgrading. I think this

might be fixed in the upcoming 10.2.4 release.




Zitat von Nick Fisk :

>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> Of Eugen Block
>> Sent: 22 November 2016 09:55
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] deep-scrubbing has large impact on performance
>>
>> Hi list,
>>
>> I've been searching the mail archive and the web for some help. I
>> tried the things I found, but I can't see the effects. We use
> Ceph for
>> our Openstack environment.
>>
>> When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
>> MONs) starts deep-scrubbing, it's impossible to work with the VMs.
>> Currently, the deep-scrubs happen to start on Monday, which is
>> unfortunate. I already plan to start the next deep-scrub on
> Saturday,
>> so it has no impact on our work days. But if I imagine we had a large
>> multi-datacenter, such performance breaks are not
> reasonable. So
>> I'm wondering how do you guys manage that?
>>
>> What I've tried so far:
>>
>> ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
>> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
>> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
>> ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
>> ceph tell osd.* injectargs '--osd_scrub_end_hour 7'
>>
>> And I also added these options to the ceph.conf.
>> To be able to work again, I had to set the nodeep-scrub option and
>> unset it when I left the office. Today, I see the cluster deep-
>> scrubbing again, but only one PG at a time, it seems that now the
>> default for osd_max_scrubs is working now and I don't see major
>> impacts yet.
>>
>> But is there something else I can do to reduce the performance impact?
>
> If you are using Jewel, the scrubing is now done in the client IO
> thread, so those disk thread options won't do anything. Instead there
> is a new priority setting, which seems to work for me, along with a
> few other settings.
>
> osd_scrub_priority = 1
> osd_scrub_sleep = .1
> osd_scrub_chunk_min = 1
> osd_scrub_chunk_max = 5
> osd_scrub_load_threshold = 5
>
> Also enabling the weighted priority queue can assist the new priority
> options
>
> osd_op_queue = wpq
>
>
>> I just found [1] and will have a look into it.
>>
>> [1] http://prob6.com/en/ceph-pg-deep-scrub-cron/
>>
>> Thanks!
>> Eugen
>>
>> --
>> Eugen Block voice   : +49-40-559 51 75
>> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
>> Postfach 61 03 15
>> D-22423 Hamburg e-mail  : ebl...@nde.ag
>>
>>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>>Sitz und Registergericht: Hamburg, HRB 90934
>>Vorstand: Jens-U. Mozdzen
>> USt-IdNr. DE 814 013 983
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

 Vorsitzende des Aufsichtsrates: Angelika Mozdzen
   Sitz und Registergericht: Hamburg, HRB 90934
   Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Eugen Block
> Sent: 22 November 2016 10:11
> To: Nick Fisk 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] deep-scrubbing has large impact on performance
> 
> Thanks for the very quick answer!
> 
> > If you are using Jewel
> 
> We are still using Hammer (0.94.7), we wanted to upgrade to Jewel in a couple 
> of weeks, would you recommend to do it now?

It's been fairly solid for me, but you might want to wait for the scrubbing 
hang bug to be fixed before upgrading. I think this
might be fixed in the upcoming 10.2.4 release.

> 
> 
> Zitat von Nick Fisk :
> 
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Eugen Block
> >> Sent: 22 November 2016 09:55
> >> To: ceph-users@lists.ceph.com
> >> Subject: [ceph-users] deep-scrubbing has large impact on performance
> >>
> >> Hi list,
> >>
> >> I've been searching the mail archive and the web for some help. I
> >> tried the things I found, but I can't see the effects. We use
> > Ceph for
> >> our Openstack environment.
> >>
> >> When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
> >> MONs) starts deep-scrubbing, it's impossible to work with the VMs.
> >> Currently, the deep-scrubs happen to start on Monday, which is
> >> unfortunate. I already plan to start the next deep-scrub on
> > Saturday,
> >> so it has no impact on our work days. But if I imagine we had a large
> >> multi-datacenter, such performance breaks are not
> > reasonable. So
> >> I'm wondering how do you guys manage that?
> >>
> >> What I've tried so far:
> >>
> >> ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
> >> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
> >> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
> >> ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
> >> ceph tell osd.* injectargs '--osd_scrub_end_hour 7'
> >>
> >> And I also added these options to the ceph.conf.
> >> To be able to work again, I had to set the nodeep-scrub option and
> >> unset it when I left the office. Today, I see the cluster deep-
> >> scrubbing again, but only one PG at a time, it seems that now the
> >> default for osd_max_scrubs is working now and I don't see major
> >> impacts yet.
> >>
> >> But is there something else I can do to reduce the performance impact?
> >
> > If you are using Jewel, the scrubing is now done in the client IO
> > thread, so those disk thread options won't do anything. Instead there
> > is a new priority setting, which seems to work for me, along with a
> > few other settings.
> >
> > osd_scrub_priority = 1
> > osd_scrub_sleep = .1
> > osd_scrub_chunk_min = 1
> > osd_scrub_chunk_max = 5
> > osd_scrub_load_threshold = 5
> >
> > Also enabling the weighted priority queue can assist the new priority
> > options
> >
> > osd_op_queue = wpq
> >
> >
> >> I just found [1] and will have a look into it.
> >>
> >> [1] http://prob6.com/en/ceph-pg-deep-scrub-cron/
> >>
> >> Thanks!
> >> Eugen
> >>
> >> --
> >> Eugen Block voice   : +49-40-559 51 75
> >> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> >> Postfach 61 03 15
> >> D-22423 Hamburg e-mail  : ebl...@nde.ag
> >>
> >>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
> >>Sitz und Registergericht: Hamburg, HRB 90934
> >>Vorstand: Jens-U. Mozdzen
> >> USt-IdNr. DE 814 013 983
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Eugen Block voice   : +49-40-559 51 75
> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> Postfach 61 03 15
> D-22423 Hamburg e-mail  : ebl...@nde.ag
> 
>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>Sitz und Registergericht: Hamburg, HRB 90934
>Vorstand: Jens-U. Mozdzen
> USt-IdNr. DE 814 013 983
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Eugen Block

Hi list,

I've been searching the mail archive and the web for some help. I  
tried the things I found, but I can't see the effects. We use Ceph for  
our Openstack environment.


When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3  
MONs) starts deep-scrubbing, it's impossible to work with the VMs.  
Currently, the deep-scrubs happen to start on Monday, which is  
unfortunate. I already plan to start the next deep-scrub on Saturday,  
so it has no impact on our work days. But if I imagine we had a large  
multi-datacenter, such performance breaks are not reasonable. So I'm  
wondering how do you guys manage that?


What I've tried so far:

ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
ceph tell osd.* injectargs '--osd_scrub_end_hour 7'

And I also added these options to the ceph.conf.
To be able to work again, I had to set the nodeep-scrub option and  
unset it when I left the office. Today, I see the cluster  
deep-scrubbing again, but only one PG at a time, it seems that now the  
default for osd_max_scrubs is working now and I don't see major  
impacts yet.


But is there something else I can do to reduce the performance impact?  
I just found [1] and will have a look into it.


[1] http://prob6.com/en/ceph-pg-deep-scrub-cron/

Thanks!
Eugen

--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Eugen Block
> Sent: 22 November 2016 09:55
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] deep-scrubbing has large impact on performance
> 
> Hi list,
> 
> I've been searching the mail archive and the web for some help. I tried the 
> things I found, but I can't see the effects. We use
Ceph for
> our Openstack environment.
> 
> When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
> MONs) starts deep-scrubbing, it's impossible to work with the VMs.
> Currently, the deep-scrubs happen to start on Monday, which is unfortunate. I 
> already plan to start the next deep-scrub on
Saturday,
> so it has no impact on our work days. But if I imagine we had a large 
> multi-datacenter, such performance breaks are not
reasonable. So
> I'm wondering how do you guys manage that?
> 
> What I've tried so far:
> 
> ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
> ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
> ceph tell osd.* injectargs '--osd_scrub_end_hour 7'
> 
> And I also added these options to the ceph.conf.
> To be able to work again, I had to set the nodeep-scrub option and unset it 
> when I left the office. Today, I see the cluster deep-
> scrubbing again, but only one PG at a time, it seems that now the default for 
> osd_max_scrubs is working now and I don't see major
> impacts yet.
> 
> But is there something else I can do to reduce the performance impact?

If you are using Jewel, the scrubing is now done in the client IO thread, so 
those disk thread options won't do anything. Instead
there is a new priority setting, which seems to work for me, along with a few 
other settings.

osd_scrub_priority = 1
osd_scrub_sleep = .1
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_scrub_load_threshold = 5

Also enabling the weighted priority queue can assist the new priority options

osd_op_queue = wpq


> I just found [1] and will have a look into it.
> 
> [1] http://prob6.com/en/ceph-pg-deep-scrub-cron/
> 
> Thanks!
> Eugen
> 
> --
> Eugen Block voice   : +49-40-559 51 75
> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
> Postfach 61 03 15
> D-22423 Hamburg e-mail  : ebl...@nde.ag
> 
>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>Sitz und Registergericht: Hamburg, HRB 90934
>Vorstand: Jens-U. Mozdzen
> USt-IdNr. DE 814 013 983
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk dmcrypt : encryption key placement problem

2016-11-22 Thread Pierre BLONDEAU
Hello,

we have a JEWEL cluster upgraded from FIREFLY. The cluster is encrypted
with dmcrypt.

Yesterday, i added some new OSDs. The first time since the upgrade. I
searched the new keys to backup them and i see that the creation of new
OSDs with the option dmcrypt changed.

To be able to retrieved the key if the server filesystem crash (
http://tracker.ceph.com/issues/14669 ) or if the OSD move, a ceph user
is created and its keyring file is used as LUKS's encryption key. Good
idea.

The problem is :
There is a small partition named ceph lockbox at the begening of the
disk. We can find the keyring among the files of this partition. Why is
the encryption key stored on the same disk and in clear ?

Someone who could get the disk would be able to read it. There's no
point encrypting it in this case.

It is urgent to move the keyring file elsewhere ( in
/etc/ceph/dmcrypt-keys ? )

Regards
Pierre

-- 
--
Pierre BLONDEAU
Administrateur Système & réseau
Université de Caen Normandie
Laboratoire GREYC, Département d'informatique

Tel : 02 31 56 75 42.
Bureau : Campus 2, Science 3, 406
--



smime.p7s
Description: Signature cryptographique S/MIME
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] export-diff behavior if an initial snapshot is NOT specified

2016-11-22 Thread Jason Dillaman
On Tue, Nov 22, 2016 at 5:31 AM, Zhongyan Gu  wrote:
> So if initial snapshot is NOT specified, then:
> rbd export-diff image@snap1 will diff all data to snap1. this cmd equals to
> :
> rbd export image@snap1. Is my understand right or not??


While they will both export all data associated w/ image@snap1, the
"export" command will generate a raw, non-sparse dump of the full
image whereas "export-diff" will export only sections of the image
that contain data. The file generated from "export" can be used with
the "import" command to create a new image, whereas the file generated
from "export-diff" can only be used with "import-diff" against an
existing image.

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel P3700 SSD for journals

2016-11-22 Thread William Josefsson
thx Alan and Anthony for sharing on these P3700 drives.

Anthony, just to follow up on your email: my OS is CentOS7.2.   Can
you please elaborate on nvme on the CentOS7.2, I'm in no way expert on
nvme, but I can here see that
https://www.pcper.com/files/imagecache/article_max_width/news/2015-06-08/Demartek_SFF-8639.png
the connectors are different for nvme. Does this mean I cannot connect
to PERC 730 raid controller?

Is there anything particular required when installing the CentOS on
these drives, or they will be automatically detected and work out of
the box by default? Thx will

On Mon, Nov 21, 2016 at 12:16 PM, Anthony D'Atri  wrote:
> The SATA S3700 series has been the de-facto for journals for some time.  And 
> journals don’t neeed all that much space.
>
> We’re using 400GB P3700’s.  I’ll say a couple of things:
>
> o Update to the latest firmware available when you get your drives, qual it 
> and stick with it for a while so you have a uniform experience
> o Run a recent kernel with a recent nvme.ko, eg. the RHEL 7.1 3.10.0-229.4.2 
> kernel’s bundled nvme.ko has a rare timing issue that causes us resets at 
> times.  YMMV.
>
> Which OS do you run?
>
>
>
> Read through this document or a newer version thereof
>
> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3700-spec.pdf
>
> or for SATA drives
>
> http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3710-spec.html
>
>
> It’s possible that your vendor is uninformed or lying, trying to upsell you.  
> At times larger units can perform better due to internal parallelism, ie. a 
> 1.6TB unit may electrically be 4x 400GB parts in parallel.  For 7200RPM LFF 
> drives, as Nick noted 12x journals per P3700 is probably as high as you want 
> to go, otherwise you can bottleneck.
>
> What *is* true is the distinction among series.  Check the graph halfway down 
> this page:
>
> http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme
>
> Prima fascia the P3500’s can seem like a relative bargain, but attend to the 
> durability — that is where the P3600 and P3700 differ dramatically.  For some 
> the P3600 may be durable enough, given certain workloads and expected years 
> of service.  I tend to be paranoid and lobbied for us to err on the side of 
> caution with the P3700.  YMMV.
>
> — Anthony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel P3700 SSD for journals

2016-11-22 Thread Anthony D'Atri
You wrote P3700 so that’s what I discussed ;)

If you want to connect to your HBA you’ll want a SATA device like the S3710 
series:

http://ark.intel.com/products/family/83425/Data-Center-SSDs#@Server

The P3700 is a PCI device, goes into an empty slot, and is not speed-limited by 
the SATA interface.  At perhaps higher cost.

With 7.2 I would think you’d be fine, driver-wise.  Either should be detected 
and work out of the box.

— Anthony


> 
> thx Alan and Anthony for sharing on these P3700 drives.
> 
> Anthony, just to follow up on your email: my OS is CentOS7.2.   Can
> you please elaborate on nvme on the CentOS7.2, I'm in no way expert on
> nvme, but I can here see that
> https://www.pcper.com/files/imagecache/article_max_width/news/2015-06-08/Demartek_SFF-8639.png
> the connectors are different for nvme. Does this mean I cannot connect
> to PERC 730 raid controller?
> 
> Is there anything particular required when installing the CentOS on
> these drives, or they will be automatically detected and work out of
> the box by default? Thx will
> 
> On Mon, Nov 21, 2016 at 12:16 PM, Anthony D'Atri  wrote:
>> The SATA S3700 series has been the de-facto for journals for some time.  And 
>> journals don’t neeed all that much space.
>> 
>> We’re using 400GB P3700’s.  I’ll say a couple of things:
>> 
>> o Update to the latest firmware available when you get your drives, qual it 
>> and stick with it for a while so you have a uniform experience
>> o Run a recent kernel with a recent nvme.ko, eg. the RHEL 7.1 3.10.0-229.4.2 
>> kernel’s bundled nvme.ko has a rare timing issue that causes us resets at 
>> times.  YMMV.
>> 
>> Which OS do you run?
>> 
>> 
>> 
>> Read through this document or a newer version thereof
>> 
>> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3700-spec.pdf
>> 
>> or for SATA drives
>> 
>> http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3710-spec.html
>> 
>> 
>> It’s possible that your vendor is uninformed or lying, trying to upsell you. 
>>  At times larger units can perform better due to internal parallelism, ie. a 
>> 1.6TB unit may electrically be 4x 400GB parts in parallel.  For 7200RPM LFF 
>> drives, as Nick noted 12x journals per P3700 is probably as high as you want 
>> to go, otherwise you can bottleneck.
>> 
>> What *is* true is the distinction among series.  Check the graph halfway 
>> down this page:
>> 
>> http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme
>> 
>> Prima fascia the P3500’s can seem like a relative bargain, but attend to the 
>> durability — that is where the P3600 and P3700 differ dramatically.  For 
>> some the P3600 may be durable enough, given certain workloads and expected 
>> years of service.  I tend to be paranoid and lobbied for us to err on the 
>> side of caution with the P3700.  YMMV.
>> 
>> — Anthony

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Contribution to CEPH

2016-11-22 Thread Patrick McGarry
Hey Jagan,

I'm happy to hear you are interested in contributing to Ceph. I would
suggest taking a look at the tracker (http://tracker.ceph.com/) for
bugs and projects you might be interested in tackling. All code and
associated repositories are available on github
(https://github.com/ceph/).

If you would like to hear some of the latest work I would recommend
joining the Ceph Developer Monthly call on the first Wed of each month
(http://wiki.ceph.com/Planning/).

Hope that gets you headed in the right direction. Thanks.


On Sun, Nov 20, 2016 at 9:15 AM, Jagan Kaartik  wrote:
> I am Jagan Kaartik, a freshman in computer science and engineering from
> Amrita school of engineering, Kerala, India.
>
> I have a basic knowledge in Python and C++.
>
> My interest in databases and network storage inspired me to join the CEPH
> organization.
>
> I want to learn and contribute and be a part of this organization. Please
> guide me.
>
> With regards,
> Jagan Kaartik
> Amrita University
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs (rbd) read performance low - where is the bottleneck?

2016-11-22 Thread Mike Miller

thank you very much for this info.

On 11/21/16 12:33 PM, Eric Eastman wrote:

Have you looked at your file layout?

On a test cluster running 10.2.3 I created a 5GB file and then looked
at the layout:

# ls -l test.dat
  -rw-r--r-- 1 root root 524288 Nov 20 23:09 test.dat
# getfattr -n ceph.file.layout test.dat
  # file: test.dat
  ceph.file.layout="stripe_unit=4194304 stripe_count=1
object_size=4194304 pool=cephfs_data"


The file layout looks the same in my case.


From what I understand with this layout you are reading 4MB of data
from 1 OSD at a time so I think you are seeing the overall speed of a
single SATA drive.  I do not think increasing your MON/MDS links to
10Gb will help, nor for a single file read will it help by going to
SSD for the metadata.


Really? Does ceph really wait until each of the stripe_unit reads has 
finished reading before the next one?



To test this, you may want to try creating 10 x 50GB files, and then
read them in parallel and see if your overall throughput increases.


Scaling through parallelism works as expected, no problem there.


If so, take a look at the layout parameters and see if you can change
the file layout to get more parallelization.

https://github.com/ceph/ceph/blob/master/doc/dev/file-striping.rst
https://github.com/ceph/ceph/blob/master/doc/cephfs/file-layouts.rst


Interesting. But how would I change this to improve single threaded read 
speed?


And how would I do the changes to already existing files?

Regards,

Mike


Regards,
Eric

On Sun, Nov 20, 2016 at 3:24 AM, Mike Miller  wrote:

Hi,

reading a big file 50 GB (tried more too)

dd if=bigfile of=/dev/zero bs=4M

in a cluster with 112 SATA disks in 10 osd (6272 pgs, replication 3) gives
me only about *122 MB/s* read speed in single thread. Scrubbing turned off
during measurement.

I have been searching for possible bottlenecks. The network is not the
problem, the machine running dd is connected to the cluster public network
with a 20 GBASE-T bond. osd dual network: cluster public 10 GBASE-T, private
10 GBASE-T.

The osd SATA disks are utilized only up until about 10% or 20%, not more
than that. CPUs on osd idle too. CPUs on mon idle, mds usage about 1.0 (1
core is used on this 6-core machine). mon and mds connected with only 1 GbE
(I would expect some latency from that, but no bandwidth issues; in fact
network bandwidth is about 20 Mbit max).

If I read a file with 50 GB, then clear the cache on the reading machine
(but not the osd caches), I get much better reading performance of about
*620 MB/s*. That seems logical to me as much (most) of the data is still in
the osd cache buffers. But still the read performance is not super
considered that the reading machine is connected to the cluster with a 20
Gbit/s bond.

How can I improve? I am not really sure, but from my understanding 2
possible bottlenecks come to mind:

1) 1 GbE connection to mon / mds

Is this the reason why reads are slow and osd disks are not hammered by read
requests and therewith fully utilized?

2) Move metadata to SSD

Currently, cephfs_metadata is on the same pool as the data on the spinning
SATA disks. Is this the bottleneck? Is the move of metadata to SSD a
solution?

Or is it both?

Your experience and insight are highly appreciated.

Thanks,

Mike
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Robert LeBlanc
If you use wpq, I recommend also setting "osd_op_queue_cut_off = high"
as well, otherwise replication OPs are not weighted and really reduces
the benefit of wpq.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Nov 22, 2016 at 5:34 AM, Eugen Block  wrote:
> Thank you!
>
>
> Zitat von Nick Fisk :
>
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Eugen Block
>>> Sent: 22 November 2016 10:11
>>> To: Nick Fisk 
>>> Cc: ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] deep-scrubbing has large impact on performance
>>>
>>> Thanks for the very quick answer!
>>>
>>> > If you are using Jewel
>>>
>>> We are still using Hammer (0.94.7), we wanted to upgrade to Jewel in a
>>> couple of weeks, would you recommend to do it now?
>>
>>
>> It's been fairly solid for me, but you might want to wait for the
>> scrubbing hang bug to be fixed before upgrading. I think this
>> might be fixed in the upcoming 10.2.4 release.
>>
>>>
>>>
>>> Zitat von Nick Fisk :
>>>
>>> >> -Original Message-
>>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>> >> Of Eugen Block
>>> >> Sent: 22 November 2016 09:55
>>> >> To: ceph-users@lists.ceph.com
>>> >> Subject: [ceph-users] deep-scrubbing has large impact on performance
>>> >>
>>> >> Hi list,
>>> >>
>>> >> I've been searching the mail archive and the web for some help. I
>>> >> tried the things I found, but I can't see the effects. We use
>>> > Ceph for
>>> >> our Openstack environment.
>>> >>
>>> >> When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
>>> >> MONs) starts deep-scrubbing, it's impossible to work with the VMs.
>>> >> Currently, the deep-scrubs happen to start on Monday, which is
>>> >> unfortunate. I already plan to start the next deep-scrub on
>>> > Saturday,
>>> >> so it has no impact on our work days. But if I imagine we had a large
>>> >> multi-datacenter, such performance breaks are not
>>> > reasonable. So
>>> >> I'm wondering how do you guys manage that?
>>> >>
>>> >> What I've tried so far:
>>> >>
>>> >> ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
>>> >> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
>>> >> ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
>>> >> ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
>>> >> ceph tell osd.* injectargs '--osd_scrub_end_hour 7'
>>> >>
>>> >> And I also added these options to the ceph.conf.
>>> >> To be able to work again, I had to set the nodeep-scrub option and
>>> >> unset it when I left the office. Today, I see the cluster deep-
>>> >> scrubbing again, but only one PG at a time, it seems that now the
>>> >> default for osd_max_scrubs is working now and I don't see major
>>> >> impacts yet.
>>> >>
>>> >> But is there something else I can do to reduce the performance impact?
>>> >
>>> > If you are using Jewel, the scrubing is now done in the client IO
>>> > thread, so those disk thread options won't do anything. Instead there
>>> > is a new priority setting, which seems to work for me, along with a
>>> > few other settings.
>>> >
>>> > osd_scrub_priority = 1
>>> > osd_scrub_sleep = .1
>>> > osd_scrub_chunk_min = 1
>>> > osd_scrub_chunk_max = 5
>>> > osd_scrub_load_threshold = 5
>>> >
>>> > Also enabling the weighted priority queue can assist the new priority
>>> > options
>>> >
>>> > osd_op_queue = wpq
>>> >
>>> >
>>> >> I just found [1] and will have a look into it.
>>> >>
>>> >> [1] http://prob6.com/en/ceph-pg-deep-scrub-cron/
>>> >>
>>> >> Thanks!
>>> >> Eugen
>>> >>
>>> >> --
>>> >> Eugen Block voice   : +49-40-559 51 75
>>> >> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
>>> >> Postfach 61 03 15
>>> >> D-22423 Hamburg e-mail  : ebl...@nde.ag
>>> >>
>>> >>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>>> >>Sitz und Registergericht: Hamburg, HRB 90934
>>> >>Vorstand: Jens-U. Mozdzen
>>> >> USt-IdNr. DE 814 013 983
>>> >>
>>> >> ___
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Eugen Block voice   : +49-40-559 51 75
>>> NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
>>> Postfach 61 03 15
>>> D-22423 Hamburg e-mail  : ebl...@nde.ag
>>>
>>>  Vorsitzende des Aufsichtsrates: Angelika Mozdzen
>>>Sitz und Registergericht: Hamburg, HRB 90934
>>>Vorstand: Jens-U. Mozdzen
>>> USt-IdNr. DE 814 013 983
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> 

[ceph-users] osd set noin ignored for old OSD ids

2016-11-22 Thread Adrian Saul

Hi ,
 As part of migration between hardware I have been building new OSDs and 
cleaning up old ones  (osd rm osd.x, osd crush rm osd.x, auth del osd.x).   To 
try and prevent rebalancing kicking in until all the new OSDs are created on a 
host I use "ceph osd set noin", however what I have seen is that if the new OSD 
that is created uses a new unique ID, then the flag is honoured and the OSD 
remains out until I bring it in.  However if the OSD re-uses a previous OSD id 
then it will go straight to in and start backfilling.  I have to manually out 
the OSD to stop it (or set nobackfill,norebalance).

Am I doing something wrong in this process or is there something about "noin" 
that is ignored for previously existing OSDs that have been removed from both 
the OSD map and crush map?

Cheers,
 Adrian




Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Contribution to CEPH

2016-11-22 Thread Brad Hubbard
Also, feel free to ask development related questions in #ceph-devel
channel on oftc

On Wed, Nov 23, 2016 at 2:30 AM, Patrick McGarry  wrote:
> Hey Jagan,
>
> I'm happy to hear you are interested in contributing to Ceph. I would
> suggest taking a look at the tracker (http://tracker.ceph.com/) for
> bugs and projects you might be interested in tackling. All code and
> associated repositories are available on github
> (https://github.com/ceph/).
>
> If you would like to hear some of the latest work I would recommend
> joining the Ceph Developer Monthly call on the first Wed of each month
> (http://wiki.ceph.com/Planning/).
>
> Hope that gets you headed in the right direction. Thanks.
>
>
> On Sun, Nov 20, 2016 at 9:15 AM, Jagan Kaartik  wrote:
>> I am Jagan Kaartik, a freshman in computer science and engineering from
>> Amrita school of engineering, Kerala, India.
>>
>> I have a basic knowledge in Python and C++.
>>
>> My interest in databases and network storage inspired me to join the CEPH
>> organization.
>>
>> I want to learn and contribute and be a part of this organization. Please
>> guide me.
>>
>> With regards,
>> Jagan Kaartik
>> Amrita University
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack Keystone with RadosGW

2016-11-22 Thread 한승진
I've figured out the main reason is.

When swift client request through keystone user like 'admin', keystone
returned with X-Auth-Token header.

After that, the swift client requests with X-Auth-Token to radosgw, but
radosgw returned 'AccessDenied'

Some people says radosgw doesn't support keystone identity version 3 yet.


2016-11-22 15:41 GMT+09:00 한승진 :

> Hi All,
>
> I am trying to implement radosgw with Openstack as an object storage
> service.
>
> I think there are 2 cases for using radosgw as an object storage
>
> First, Keystone <-> Ceph connect directly.
>
> like below guide..
>
> http://docs.ceph.com/docs/master/radosgw/keystone/
>
> Second, use ceph as a back-end of swift.
>
> like below guide..
>
> https://github.com/openstack/swift-ceph-backend#installation
>
> In first case, It issues always 405 error therefore I cannot go forward
> any more.
>
> In second case, I don't know how to make ring builder in ceph backend
> environment.
>
> Is anybody use radosgw with OpenStack? Please give me a guide.
>
> Thanks.
>
> John.
>
> =
> Here is my ceph.conf configurations
>
> [client.radosgw.cephmon01]
> rgw keystone api version = 3
> rgw keystone url =  http://controller:35357
> rgw keystone admin user = swift
> rgw keystone admin password = *
> rgw keystone admin project = service
> rgw keystone admin domain = default
> rgw keystone accepted roles =  admin,user
>
> rgw s3 auth use keystone = true
> rgw keystone verify ssl = false
>
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] deep-scrubbing has large impact on performance

2016-11-22 Thread Eugen Block

Thanks for the very quick answer!


If you are using Jewel


We are still using Hammer (0.94.7), we wanted to upgrade to Jewel in a  
couple of weeks, would you recommend to do it now?



Zitat von Nick Fisk :


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On  
Behalf Of Eugen Block

Sent: 22 November 2016 09:55
To: ceph-users@lists.ceph.com
Subject: [ceph-users] deep-scrubbing has large impact on performance

Hi list,

I've been searching the mail archive and the web for some help. I  
tried the things I found, but I can't see the effects. We use

Ceph for

our Openstack environment.

When our cluster (2 pools, each 4092 PGs, in 20 OSDs on 4 nodes, 3
MONs) starts deep-scrubbing, it's impossible to work with the VMs.
Currently, the deep-scrubs happen to start on Monday, which is  
unfortunate. I already plan to start the next deep-scrub on

Saturday,
so it has no impact on our work days. But if I imagine we had a  
large multi-datacenter, such performance breaks are not

reasonable. So

I'm wondering how do you guys manage that?

What I've tried so far:

ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
ceph tell osd.* injectargs '--osd_scrub_begin_hour 0'
ceph tell osd.* injectargs '--osd_scrub_end_hour 7'

And I also added these options to the ceph.conf.
To be able to work again, I had to set the nodeep-scrub option and  
unset it when I left the office. Today, I see the cluster deep-
scrubbing again, but only one PG at a time, it seems that now the  
default for osd_max_scrubs is working now and I don't see major

impacts yet.

But is there something else I can do to reduce the performance impact?


If you are using Jewel, the scrubing is now done in the client IO  
thread, so those disk thread options won't do anything. Instead
there is a new priority setting, which seems to work for me, along  
with a few other settings.


osd_scrub_priority = 1
osd_scrub_sleep = .1
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_scrub_load_threshold = 5

Also enabling the weighted priority queue can assist the new priority options

osd_op_queue = wpq



I just found [1] and will have a look into it.

[1] http://prob6.com/en/ceph-pg-deep-scrub-cron/

Thanks!
Eugen

--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

 Vorsitzende des Aufsichtsrates: Angelika Mozdzen
   Sitz und Registergericht: Hamburg, HRB 90934
   Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com