Re: [ceph-users] .New Ceph cluster - cannot add additional monitor

2015-06-17 Thread Mike Carlson
Just to follow up, I started from scratch, and I think the key was to run
ceph-deploy purge (nodes) , ceph-deploy purgdata (nodes) and finally
ceph-deploy forgetkeys

Thanks for the replies Alex and Alex!
Mike C
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Interesting postmortem on SSDs from Algolia

2015-06-17 Thread Steve Anthony
There's often a great deal of discussion about which SSDs to use for
journals, and why some of the cheaper SSDs end up being more expensive
in the long run. The recent blog post at Algoria, though not Ceph
specific, provides a good illustration of exactly how insidious
kernel/SSD interactions can be. Thought the list might find it
interesting.   

https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/

-Steve

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma...@lehigh.edu




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very chatty MON logs: Is this normal?

2015-06-17 Thread Somnath Roy
 However, I'd rather not set the level to 0/0, as that would disable all 
logging from the MONs

I don't think so. All the error scenarios and stack trace (in case of crash) 
are supposed to be logged with log level 0. But, generally, we need the highest 
log level (say 20) to get all the information when something to debug.
So, I doubt how beneficial it will be to enable logging for some intermediate 
levels.
Probably, there is no guideline for these log level too which developer should 
follow strictly.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daniel 
Schneller
Sent: Wednesday, June 17, 2015 12:11 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Very chatty MON logs: Is this normal?

On 2015-06-17 18:52:51 +, Somnath Roy said:

 This is presently written from log level 1 onwards :-) So, only log
 level 0 will not log this..
 Try, 'debug_mon = 0/0' in the conf file..

Yeah, once I had sent the mail I realized that 1 in the log line was the 
level. Had overlooked that before.
However, I'd rather not set the level to 0/0, as that would disable all logging 
from the MONs.

 Now, I don't have enough knowledge on that part to say whether it is
 important enough to log at log level 1 , sorry :-(

That would indeed be an interesting to know.
Judging from the sheer amount, at least I have my doubts, because the cluster 
seems to be running without any issues. So I figure at least it isn't 
indicative of an immediate issue.

Anyone with a little more definitve knowledge around? Should I create a bug 
ticket for this?

Cheers,
Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD with OCFS2

2015-06-17 Thread Somnath Roy
Sorry Prabu, I forgot to mention the bold settings in the conf file you need to 
tweak based on your HW configuration (cpu, disk etc.) and number of OSDs 
otherwise it may hit you back badly.

Thanks  Regards
Somnath

From: Somnath Roy
Sent: Wednesday, June 17, 2015 11:25 AM
To: 'gjprabu'
Cc: Kamala Subramani; ceph-users@lists.ceph.com; Siva Sokkumuthu
Subject: RE: RE: Re: [ceph-users] Ceph OSD with OCFS2

Okay..You didn’t mention anything about your rbd client host config and the cpu 
cores of OSD/rbd system..Some thoughts what you can do…

1. Considering pretty lean cpu config you have , I would say check for cpu 
usage of both OSD and rbd nodes first. If it is saturated already, you are out 
of luck ☺

2. There are quite a bit of write path improvement went in with Hammer and 
latest ceph, hope you are using that code base.

3. I would say put ceph journal on SSD at least, this should give you a boost.

4. Check the pool pg number , hope this is at least 64 or so.

5. if you are using kernel rbd to map, take the latest krbd code base and build 
for your kernel. Reason is, there are some very important krbd performance fix 
went in that unfortunately probably yet to be part of any kernel. That should 
give you a boost.

6. Make the following changes in your conf file if you are not doing it already 
and see if it is improving anything. Make sure you are at least using ‘hammer’ 
for this..

auth_supported = none
auth_service_required = none
auth_client_required = none
auth_cluster_required = none
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_keyvaluestore = 0/0
debug_newstore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_threads = 2
ms_crc_data = false
ms_crc_header = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 12
   osd_enable_op_tracker = false

7. How many copies are you having , 1 or 2 ?

Thanks  Regards
Somnath

From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Wednesday, June 17, 2015 12:05 AM
To: Somnath Roy
Cc: Kamala Subramani; 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; Siva Sokkumuthu
Subject: Re: RE: Re: [ceph-users] Ceph OSD with OCFS2

Hi Somnath,

   Yes, We will analyze is there any bottleneck, do we have any 
valuable command to analyze this bottleneck.

 1. What is your backend cluster configuration like how many OSDs, PGs/pool, 
 HW details etc

We are using 2 OSD and there is no PGs/Pool created , it is default. 
Hardware is physical machine above 2 GB RAM.

2. Is it a single big rbd image you mounted from different hosts and running 
OCFS2 on top ? Please give some details on that front.

 Yes, It is single rbd image we are using in different hosts and 
running OCFS2 on top

rbd ls
newinteg

rbd showmapped
   id pool imagesnap device
   1  rbd  newinteg -/dev/rbd1

rbd info newinteg
rbd image 'newinteg':
size 7 MB in 17500 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1149.74b0dc51
format: 1

 3. Also, is this HDD or SSD setup ? If HDD, hope you have journals on SSD.

Hope so this HDD and below is the out put for disk.

*-ide
   description: IDE interface
   product: 82371SB PIIX3 IDE [Natoma/Triton II]
   vendor: Intel Corporation
   physical id: 1.1
   bus info: pci@:00:01.1mailto:pci@:00:01.1
   version: 00
   width: 32 bits
   clock: 33MHz
   capabilities: ide bus_master
   configuration: driver=ata_piix latency=0
   resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8) 
ioport:376 ioport:c000(size=16)
  *-scsi
   description: SCSI storage controller
   product: Virtio block device
   vendor: Red Hat, Inc
   physical id: 4
   bus info: pci@:00:04.0mailto:pci@:00:04.0
   version: 00
   width: 32 bits
   clock: 33MHz
   capabilities: scsi msix bus_master cap_list
   configuration: driver=virtio-pci latency=0
   resources: irq:11 ioport:c080(size=64) memory:f204-f2040fff
Regards
Prabu GJ



 On Tue, 16 Jun 2015 21:50:29 +0530 Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com wrote 

Okay…I think the extra 

[ceph-users] best Linux distro for Ceph

2015-06-17 Thread Shane Gibson

Ok - I know this post has the potential to spread to unsavory corners of 
discussion about the best linux distro ... blah blah blah ... please, don't 
let it go there ... !

I'm seeking some input from people that have been running larger Ceph clusters 
... on the order of 100s of physical servers with thousands of OSDs in them.  
Our primary use case is Object via Swift API integration and adding Block store 
capability for both OpenStack/KVM backing VMs, as well as general use for 
various block store scenarios.

We'd *like* to look at CephFS, and I'm heartened to see a kernel module (over 
the FUSE based), and a growing use base around it, and hoping production 
ready will soon be stamped on CephFS ...

We currently deploy Ubuntu (primarily Trusty - 14.04), and CentOS 7.1.   
Currently we've been testing our Ceph clusters on both, but our preference as 
an organization is CentOS 7.1.1503 (currently).

However - I see a lot of noise in the list about needing to track the more 
modern kernel versions as opposed to the already dated 3.10.x that CentOS 7.1 
deploys.  Yes, I know RH and community backport a lot of the newer kernel 
features to their kernel version ... but ... not everything gets backported.

Can someone out there with real world, larger scale Ceph cluster operational 
experience provide a guideline on the Linux Distro they deploy/use, and works 
well with Ceph, and is more inline with keeping up with modern kernel 
versions ... without crossing the line in to the bleeding and painful edge 
versions ... ?

Thank you ...

~~shane


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD with OCFS2

2015-06-17 Thread Somnath Roy
Okay..You didn’t mention anything about your rbd client host config and the cpu 
cores of OSD/rbd system..Some thoughts what you can do…

1. Considering pretty lean cpu config you have , I would say check for cpu 
usage of both OSD and rbd nodes first. If it is saturated already, you are out 
of luck ☺

2. There are quite a bit of write path improvement went in with Hammer and 
latest ceph, hope you are using that code base.

3. I would say put ceph journal on SSD at least, this should give you a boost.

4. Check the pool pg number , hope this is at least 64 or so.

5. if you are using kernel rbd to map, take the latest krbd code base and build 
for your kernel. Reason is, there are some very important krbd performance fix 
went in that unfortunately probably yet to be part of any kernel. That should 
give you a boost.

6. Make the following changes in your conf file if you are not doing it already 
and see if it is improving anything. Make sure you are at least using ‘hammer’ 
for this..

auth_supported = none
auth_service_required = none
auth_client_required = none
auth_cluster_required = none
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_keyvaluestore = 0/0
debug_newstore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_threads = 2
ms_crc_data = false
ms_crc_header = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 12
   osd_enable_op_tracker = false

7. How many copies are you having , 1 or 2 ?

Thanks  Regards
Somnath

From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Wednesday, June 17, 2015 12:05 AM
To: Somnath Roy
Cc: Kamala Subramani; ceph-users@lists.ceph.com; Siva Sokkumuthu
Subject: Re: RE: Re: [ceph-users] Ceph OSD with OCFS2

Hi Somnath,

   Yes, We will analyze is there any bottleneck, do we have any 
valuable command to analyze this bottleneck.

 1. What is your backend cluster configuration like how many OSDs, PGs/pool, 
 HW details etc

We are using 2 OSD and there is no PGs/Pool created , it is default. 
Hardware is physical machine above 2 GB RAM.

2. Is it a single big rbd image you mounted from different hosts and running 
OCFS2 on top ? Please give some details on that front.

 Yes, It is single rbd image we are using in different hosts and 
running OCFS2 on top

rbd ls
newinteg

rbd showmapped
   id pool imagesnap device
   1  rbd  newinteg -/dev/rbd1

rbd info newinteg
rbd image 'newinteg':
size 7 MB in 17500 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1149.74b0dc51
format: 1

 3. Also, is this HDD or SSD setup ? If HDD, hope you have journals on SSD.

Hope so this HDD and below is the out put for disk.

*-ide
   description: IDE interface
   product: 82371SB PIIX3 IDE [Natoma/Triton II]
   vendor: Intel Corporation
   physical id: 1.1
   bus info: pci@:00:01.1mailto:pci@:00:01.1
   version: 00
   width: 32 bits
   clock: 33MHz
   capabilities: ide bus_master
   configuration: driver=ata_piix latency=0
   resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8) 
ioport:376 ioport:c000(size=16)
  *-scsi
   description: SCSI storage controller
   product: Virtio block device
   vendor: Red Hat, Inc
   physical id: 4
   bus info: pci@:00:04.0mailto:pci@:00:04.0
   version: 00
   width: 32 bits
   clock: 33MHz
   capabilities: scsi msix bus_master cap_list
   configuration: driver=virtio-pci latency=0
   resources: irq:11 ioport:c080(size=64) memory:f204-f2040fff
Regards
Prabu GJ



 On Tue, 16 Jun 2015 21:50:29 +0530 Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com wrote 

Okay…I think the extra layers you have will add some delay, but 1m is high 
probably (I never tested Ceph on HDD though).

We can minimize it probably by optimizing the cluster setup.

Please monitor your backend cluster or even the rbd nodes to see if anything is 
bottleneck there.

Also, check if there is any delay between you are issuing request on OCFS2/rbd 
getting that/cluster getting that.



Could you please share the following details ?



1. What is your backend 

[ceph-users] Explanation for ceph osd set nodown and ceph osd cluster_snap

2015-06-17 Thread Jan Schermer
1) Flags available in ceph osd set are

pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent

I know or can guess most of them (the docs are a “bit” lacking)

But with ceph osd set nodown” I have no idea what it should be used for - to 
keep hammering a faulty OSD?

2) looking through the docs there I found reference to ceph osd cluster_snap”
http://ceph.com/docs/v0.67.9/rados/operations/control/ 
http://ceph.com/docs/v0.67.9/rados/operations/control/

what does it do? how does that work? does it really work? ;-) I got a few hits 
on google which suggest it might not be something that really works, but looks 
like something we could certainly use

Thanks

Jan___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very chatty MON logs: Is this normal?

2015-06-17 Thread Daniel Schneller

On 2015-06-17 18:52:51 +, Somnath Roy said:


This is presently written from log level 1 onwards :-)
So, only log level 0 will not log this..
Try, 'debug_mon = 0/0' in the conf file..


Yeah, once I had sent the mail I realized that 1 in the log line was 
the level. Had overlooked that before.
However, I'd rather not set the level to 0/0, as that would disable all 
logging from the MONs.


Now, I don't have enough knowledge on that part to say whether it is 
important enough to log at log level 1 , sorry :-(


That would indeed be an interesting to know.
Judging from the sheer amount, at least I have my doubts, because the 
cluster seems to be running without any issues. So I figure at least it 
isn't indicative of an immediate issue.


Anyone with a little more definitve knowledge around? Should I create a 
bug ticket for this?


Cheers,
Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Expanding a ceph cluster with ansible

2015-06-17 Thread Stillwell, Bryan
I've been working on automating a lot of our ceph admin tasks lately and am
pretty pleased with how the puppet-ceph module has worked for installing
packages, managing ceph.conf, and creating the mon nodes.  However, I don't
like the idea of puppet managing the OSDs.  Since we also use ansible in my
group, I took a look at ceph-ansible to see how it might be used to
complete
this task.  I see examples for doing a rolling update and for doing an os
migration, but nothing for adding a node or multiple nodes at once.  I
don't
have a problem doing this work, but wanted to check with the community if
any one has experience using ceph-ansible for this?

After a lot of trial and error I found the following process works well
when
using ceph-deploy, but it's a lot of steps and can be error prone
(especially if you have old cephx keys that haven't been removed yet):

# Disable backfilling and scrubbing to prevent too many performance
# impacting tasks from happening at the same time.  Maybe adding norecover
# to this list might be a good idea so only peering happens at first.
ceph osd set nobackfill
ceph osd set noscrub
ceph osd set nodeep-scrub

# Zap the disks to start from a clean slate
ceph-deploy disk zap dnvrco01-cephosd-025:sd{b..y}

# Prepare the disks.  I found sleeping between adding each disk can help
# prevent performance problems.
ceph-deploy osd prepare dnvrco01-cephosd-025:sdh:/dev/sdb; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdi:/dev/sdb; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdj:/dev/sdb; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdk:/dev/sdc; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdl:/dev/sdc; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdm:/dev/sdc; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdn:/dev/sdd; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdo:/dev/sdd; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdp:/dev/sdd; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdq:/dev/sde; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdr:/dev/sde; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sds:/dev/sde; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdt:/dev/sdf; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdu:/dev/sdf; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdv:/dev/sdf; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdw:/dev/sdg; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdx:/dev/sdg; sleep 15
ceph-deploy osd prepare dnvrco01-cephosd-025:sdy:/dev/sdg; sleep 15

# Weight in the new OSDs.  We set 'osd_crush_initial_weight = 0' to prevent
# them from being added in during the prepare step.  Maybe a longer weight
# in the last step would make this step unncessary.
ceph osd crush reweight osd.450 1.09; sleep 60
ceph osd crush reweight osd.451 1.09; sleep 60
ceph osd crush reweight osd.452 1.09; sleep 60
ceph osd crush reweight osd.453 1.09; sleep 60
ceph osd crush reweight osd.454 1.09; sleep 60
ceph osd crush reweight osd.455 1.09; sleep 60
ceph osd crush reweight osd.456 1.09; sleep 60
ceph osd crush reweight osd.457 1.09; sleep 60
ceph osd crush reweight osd.458 1.09; sleep 60
ceph osd crush reweight osd.459 1.09; sleep 60
ceph osd crush reweight osd.460 1.09; sleep 60
ceph osd crush reweight osd.461 1.09; sleep 60
ceph osd crush reweight osd.462 1.09; sleep 60
ceph osd crush reweight osd.463 1.09; sleep 60
ceph osd crush reweight osd.464 1.09; sleep 60
ceph osd crush reweight osd.465 1.09; sleep 60
ceph osd crush reweight osd.466 1.09; sleep 60
ceph osd crush reweight osd.467 1.09; sleep 60

# Once all the OSDs are added to the cluster, allow the backfill process to
# begin.
ceph osd unset nobackfill

# Then once cluster is healthy again, re-enable scrubbing
ceph osd unset noscrub
ceph osd unset nodeep-scrub


This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw did not create auth url for swift

2015-06-17 Thread Vickie ch
Hi all,
I want to use swift-client to connect ceph cluster. I have done s3 test
on this cluster before.
So I follow the guide to create a subuser and use swift client to test
it. But always got an error 404 Not Found
How can I create the auth page?Any help will be appreciated.

   - 1 Mon(rgw), 3OSD server(each server 3disks).
   - CEPH:0.94.1-13
   - Swift-client:2.4.0

start--
*test@uclient:~$ swift --debug -V 1.0 -A http://192.168.1.110/auth
http://192.168.1.110/auth -U melon:swift -K
'ujZx+foSYDniRzwypqnqNR7hr763zdt+Qe7TpwvR' list*
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.1.110
DEBUG:urllib3.connectionpool:Setting read timeout to object object at
0x7fa22f0b3090
DEBUG:urllib3.connectionpool:GET /auth HTTP/1.1 404 279
INFO:swiftclient:REQ: curl -i http://192.168.1.110/auth -X GET
INFO:swiftclient:RESP STATUS: 404 Not Found
INFO:swiftclient:RESP HEADERS: [('date', 'Thu, 18 Jun 2015 01:51:58 GMT'),
('content-length', '279'), ('content-type', 'text/html;
charset=iso-8859-1'), ('server', 'Apache/2.4.7 (Ubuntu)')]
INFO:swiftclient:RESP BODY: !DOCTYPE HTML PUBLIC -//IETF//DTD HTML
2.0//EN
htmlhead
title404 Not Found/title
/headbody
h1Not Found/h1
pThe requested URL /auth was not found on this server./p
hr
addressApache/2.4.7 (Ubuntu) Server at 192.168.1.110 Port 80/address
/body/html

ERROR:swiftclient:Auth GET failed: http://192.168.1.110/auth 404 Not Found
Traceback (most recent call last):
  File /usr/local/lib/python2.7/dist-packages/swiftclient/client.py, line
1253, in _retry
self.url, self.token = self.get_auth()
  File /usr/local/lib/python2.7/dist-packages/swiftclient/client.py, line
1227, in get_auth
insecure=self.insecure)
  File /usr/local/lib/python2.7/dist-packages/swiftclient/client.py, line
397, in get_auth
insecure=insecure)
  File /usr/local/lib/python2.7/dist-packages/swiftclient/client.py, line
278, in get_auth_1_0
http_status=resp.status, http_reason=resp.reason)
ClientException: Auth GET failed: http://192.168.1.110/auth 404 Not Found
Account not found
stop--

​   Guide:
 1.)https://ceph.com/docs/v0.78/radosgw/config/
 2.)http://docs.ceph.com/docs/v0.94/radosgw/config/
3.*)**http://docs.ceph.com/docs/v0.94/radosgw/admin/
http://docs.ceph.com/docs/v0.94/radosgw/admin/*​

Best wishes,
Mika
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware cache settings recomendation

2015-06-17 Thread Mateusz Skała
Thanks for answer,

I made some test, first leave dwc=enabled and caching on journal drive 
disabled. Latency grows from 20ms to 90ms on this drive. Next I enabled cache 
on journal drive and disabled all cache on data drives. Latency on data drives 
grows from 30 – 50ms to 1500 – 2000ms. 
Test made only on one osd host with P410i controller, with SATA drives 
ST1000LM014-1EJ1 for data and for journal  SSD  INTEL SSDSC2BW12.
Regards, 
Mateusz


From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Wednesday, June 17, 2015 9:41 AM
To: Mateusz Skała
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Hardware cache settings recomendation

Cache on top of the data drives (not journal) will not help in most cases, 
those writes are already buffered in the OS - so unless your OS is very light 
on memory and flushing constantly it will have no effect, it just adds overhead 
in case a flush comes. I haven’t tested this extensively with Ceph, though.

Cache enabled on journal drive _could_ help if your SSD is very slow (or if you 
don’t have SSD for journal at all), and if it is large enough (more than the 
active journal size) it could prolong the life of your SSD - depending on how 
and when the cache starts to flush. I know from experience that write cache on 
Areca controller didn't flush at all until it hit a watermark (50% capacity 
default or something) and it will be faster than some SSDon their own. Some SSD 
have higher IOPS than the cache can achieve, but you likely won’t saturate that 
with Ceph.

Another thing is write cache on the drives themselves - I’d leave that on 
disabled (which is probably the default) unless the drive in question has 
capacitors to flush the cache in case of power failure. Controllers usually 
have a whitelist of devices that respect flushes on which the write cache is 
default=enabled, but in case of for example Dell Perc you would need to have 
Dell original drives or enable it manually.

YMMV - i’ve hit the controller cache IOPS limit in the past with cheap Dell 
Perc (H310 was it?) that did ~20K IOPS top on one SSD drive, while the drive 
itself did close to 40K. On my SSDs, disabling write cache helps latency (good 
for journal) bud could be troubling for the SSD lifetime.

In any case I don’t think you would saturate either with Ceph, so I recommend 
you just test the latency with write cache enabled/disabled on the controller 
and pick the one that gives the best numbers
this is basically how: 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Ceph recommended way is to use everything as passthrough (initiator/target 
mode) or JBOD (RAID0 with single drives on some controllers), so I’d stick with 
that.

Jan


On 17 Jun 2015, at 08:01, Mateusz Skała mateusz.sk...@budikom.net wrote:

Yes, all disk are in single drive raid 0. Now cache is enabled for all drives, 
should I disable cache for SSD drives?
Regards,
Mateusz
 
From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] 
Sent: Thursday, June 11, 2015 7:30 PM
To: Mateusz Skała
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Hardware cache settings recomendation
 
You want write cache to disk, no write cache for SSD.
 
I assume all of your data disk are single drive raid 0?
 
 
 
Tyler Bishop
Chief Executive Officer
513-299-7108 x10
tyler.bis...@beyondhosting.net

If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.
 
 

From: Mateusz Skała mateusz.sk...@budikom.net
To: ceph-users@lists.ceph.com
Sent: Saturday, June 6, 2015 4:09:59 AM
Subject: [ceph-users] Hardware cache settings recomendation
 
Hi,
Please help me with hardware cache settings on controllers for ceph rbd best 
performance. All Ceph hosts have one SSD drive for journal.
 
We are using 4 different controllers, all with BBU: 
• HP Smart Array P400
• HP Smart Array P410i
• Dell PERC 6/i
• Dell  PERC H700
 
I have to set cache policy, on Dell settings are:
• Read Policy 
o   Read-Ahead (current)
o   No-Read-Ahead
o   Adaptive Read-Ahead
• Write Policy 
o   Write-Back (current)
o   Write-Through 
• Cache Policy
o   Cache I/O
o   Direct I/O (current)
• Disk Cache Policy
o   Default (current)
o   Enabled
o   Disabled
On HP controllers:
• Cache Ratio (current: 25% Read / 75% Write)
• Drive Write Cache
o   Enabled (current)
o   Disabled
 
   And there is one more setting in LogicalDrive option:
• Caching: 
o   Enabled (current)
o   Disabled
 
Please verify my settings and give me some recomendations. 
Best regards,
Mateusz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd_scrub_chunk_min/max scrub_sleep?

2015-06-17 Thread Tu Holmes
Hey gang,

Some options are just not documented well… 

What’s up with: 
osd_scrub_chunk_min
osd_scrub_chunk_max
osd_scrub_sleep?



===
Tu Holmes
tu.hol...@gmail.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rename pool by id

2015-06-17 Thread Georgios Dimitrakakis

Pavel,

unfortunately there isn't a way to rename  a pool usign its ID as I 
have learned myself the hard way since I 've faced a few months ago the 
exact same issue.


It would be a good idea for developers to also include a way to 
manipulate (rename, delete, etc.) pools using the ID which is definitely 
unique and in my opinion would be error-resistant or at least less 
susceptible to errors.


In order to succeed what you want try the command:

rados rmpool   --yes-i-really-really-mean-it

which will actually remove the problematic pool, as shown here : 
http://cephnotes.ksperis.com/blog/2014/10/29/remove-pool-without-name .


To be fair and give credits everywhere this solution was also suggested 
to me at the IRC channel by debian112 at that time.



Best regards,

George



On Wed, 17 Jun 2015 17:17:55 +0600, pa...@gradient54.ru wrote:

Hi all, is any way to rename a pool by ID (pool number).
I have one pool with empty name, it is not used and just want delete
this, but can't do it, because pool name required.

ceph osd lspools
0 data,1 metadata,2 rbd,12 ,16 libvirt,

I want rename this: pool #12

Thanks,
Pavel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Nathan Cutler
 We've since merged something 
 that stripes over several small xattrs so that we can keep things inline, 
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

Hi Sage:

You wrote yet - should we earmark it for hammer backport?

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware cache settings recomendation

2015-06-17 Thread Mateusz Skała
Yes, all disk are in single drive raid 0. Now cache is enabled for all drives, 
should I disable cache for SSD drives?

Regards,

Mateusz

 

From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net] 
Sent: Thursday, June 11, 2015 7:30 PM
To: Mateusz Skała
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Hardware cache settings recomendation

 

You want write cache to disk, no write cache for SSD.

 

I assume all of your data disk are single drive raid 0?

 

 


   http://static.beyondhosting.net/img/bh-small.png 


Tyler Bishop
Chief Executive Officer
513-299-7108 x10


tyler.bis...@beyondhosting.net mailto:tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

 

 

  _  

From: Mateusz Skała mateusz.sk...@budikom.net 
mailto:mateusz.sk...@budikom.net 
To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com 
Sent: Saturday, June 6, 2015 4:09:59 AM
Subject: [ceph-users] Hardware cache settings recomendation

 

Hi,

Please help me with hardware cache settings on controllers for ceph rbd best 
performance. All Ceph hosts have one SSD drive for journal.

 

We are using 4 different controllers, all with BBU: 

* HP Smart Array P400

* HP Smart Array P410i

* Dell PERC 6/i

* Dell  PERC H700

 

I have to set cache policy, on Dell settings are:

* Read Policy 

o   Read-Ahead (current)

o   No-Read-Ahead

o   Adaptive Read-Ahead

* Write Policy 

o   Write-Back (current)

o   Write-Through 

* Cache Policy

o   Cache I/O

o   Direct I/O (current)

* Disk Cache Policy

o   Default (current)

o   Enabled

o   Disabled

On HP controllers:

* Cache Ratio (current: 25% Read / 75% Write)

* Drive Write Cache

o   Enabled (current)

o   Disabled

 

And there is one more setting in LogicalDrive option:

* Caching: 

o   Enabled (current)

o   Disabled

 

Please verify my settings and give me some recomendations. 

Best regards,

Mateusz


___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.2 Hammer released

2015-06-17 Thread Dan van der Ster
On Thu, Jun 11, 2015 at 7:34 PM, Sage Weil sw...@redhat.com wrote:
 * ceph-objectstore-tool should be in the ceph server package (#11376, Ken
   Dreyer)

We had a little trouble yum updating from 0.94.1 to 0.94.2:

file /usr/bin/ceph-objectstore-tool from install of
ceph-1:0.94.2-0.el6.x86_64 conflicts with file from package
ceph-test-1:0.94.1-0.el6.x86_64

Reported here: http://tracker.ceph.com/issues/12033

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware cache settings recomendation

2015-06-17 Thread Jan Schermer
Cache on top of the data drives (not journal) will not help in most cases, 
those writes are already buffered in the OS - so unless your OS is very light 
on memory and flushing constantly it will have no effect, it just adds overhead 
in case a flush comes. I haven’t tested this extensively with Ceph, though.

Cache enabled on journal drive _could_ help if your SSD is very slow (or if you 
don’t have SSD for journal at all), and if it is large enough (more than the 
active journal size) it could prolong the life of your SSD - depending on how 
and when the cache starts to flush. I know from experience that write cache on 
Areca controller didn't flush at all until it hit a watermark (50% capacity 
default or something) and it will be faster than some SSDon their own. Some SSD 
have higher IOPS than the cache can achieve, but you likely won’t saturate that 
with Ceph.

Another thing is write cache on the drives themselves - I’d leave that on 
disabled (which is probably the default) unless the drive in question has 
capacitors to flush the cache in case of power failure. Controllers usually 
have a whitelist of devices that respect flushes on which the write cache is 
default=enabled, but in case of for example Dell Perc you would need to have 
Dell original drives or enable it manually.

YMMV - i’ve hit the controller cache IOPS limit in the past with cheap Dell 
Perc (H310 was it?) that did ~20K IOPS top on one SSD drive, while the drive 
itself did close to 40K. On my SSDs, disabling write cache helps latency (good 
for journal) bud could be troubling for the SSD lifetime.

In any case I don’t think you would saturate either with Ceph, so I recommend 
you just test the latency with write cache enabled/disabled on the controller 
and pick the one that gives the best numbers
this is basically how: 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
 
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Ceph recommended way is to use everything as passthrough (initiator/target 
mode) or JBOD (RAID0 with single drives on some controllers), so I’d stick with 
that.

Jan


 On 17 Jun 2015, at 08:01, Mateusz Skała mateusz.sk...@budikom.net wrote:
 
 Yes, all disk are in single drive raid 0. Now cache is enabled for all 
 drives, should I disable cache for SSD drives?
 Regards,
 Mateusz
  
 From: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net 
 mailto:tyler.bis...@beyondhosting.net] 
 Sent: Thursday, June 11, 2015 7:30 PM
 To: Mateusz Skała
 Cc: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Hardware cache settings recomendation
  
 You want write cache to disk, no write cache for SSD.
  
 I assume all of your data disk are single drive raid 0?
  
  
  
 Tyler Bishop
 Chief Executive Officer
 513-299-7108 x10
 tyler.bis...@beyondhosting.net mailto:tyler.bis...@beyondhosting.net
 If you are not the intended recipient of this transmission you are notified 
 that disclosing, copying, distributing or taking any action in reliance on 
 the contents of this information is strictly prohibited.
  
  
 From: Mateusz Skała mateusz.sk...@budikom.net 
 mailto:mateusz.sk...@budikom.net
 To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 Sent: Saturday, June 6, 2015 4:09:59 AM
 Subject: [ceph-users] Hardware cache settings recomendation
  
 Hi,
 Please help me with hardware cache settings on controllers for ceph rbd best 
 performance. All Ceph hosts have one SSD drive for journal.
  
 We are using 4 different controllers, all with BBU: 
 · HP Smart Array P400
 · HP Smart Array P410i
 · Dell PERC 6/i
 · Dell  PERC H700
  
 I have to set cache policy, on Dell settings are:
 · Read Policy 
 o   Read-Ahead (current)
 o   No-Read-Ahead
 o   Adaptive Read-Ahead
 · Write Policy 
 o   Write-Back (current)
 o   Write-Through 
 · Cache Policy
 o   Cache I/O
 o   Direct I/O (current)
 · Disk Cache Policy
 o   Default (current)
 o   Enabled
 o   Disabled
 On HP controllers:
 · Cache Ratio (current: 25% Read / 75% Write)
 · Drive Write Cache
 o   Enabled (current)
 o   Disabled
  
 And there is one more setting in LogicalDrive option:
 · Caching: 
 o   Enabled (current)
 o   Disabled
  
 Please verify my settings and give me some recomendations. 
 Best regards,
 Mateusz
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS: 'ls -alR' performance terrible unless Linux cache flushed

2015-06-17 Thread negillen negillen
I have done some quick tests with FUSE too: it seems to me that, both with
the old and with the new kernel, FUSE is approx. five times slower than
kernel driver for both reading files and getting stats.
I don't know whether it is just me or if it is expected.

On Wed, Jun 17, 2015 at 2:56 AM, Francois Lafont flafdiv...@free.fr wrote:

 Hi,

 On 16/06/2015 18:46, negillen negillen wrote:

  Fixed! At least looks like fixed.

 That's cool for you. ;)

  It seems that after migrating every node (both servers and clients) from
  kernel 3.10.80-1 to 4.0.4-1 the issue disappeared.
  Now I get decent speeds both for reading files and for getting stats from
  every node.

 It seems to me that an interesting test could be to let the old kernel in
 your client nodes (ie 3.10.80-1), use ceph-fuse instead of the ceph kernel
 module and test if you have decent speeds too.

 Bye.

 --
 François Lafont
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-17 Thread Irek Fasikhov
If necessary, there are RPM files for centos 7:
​
 gperftools.spec
https://drive.google.com/file/d/0BxoNLVWxzOJWaVVmWTA3Z18zbUE/edit?usp=drive_web
​​
 pprof-2.4-1.el7.centos.noarch.rpm
https://drive.google.com/file/d/0BxoNLVWxzOJWRmQ2ZEt6a1pnSVk/edit?usp=drive_web
​​
 gperftools-libs-2.4-1.el7.centos.x86_64.rpm
https://drive.google.com/file/d/0BxoNLVWxzOJWcVByNUZHWWJqRXc/edit?usp=drive_web
​​
 gperftools-devel-2.4-1.el7.centos.x86_64.rpm
https://drive.google.com/file/d/0BxoNLVWxzOJWYTUzQTNha3J3NEU/edit?usp=drive_web
​​
 gperftools-debuginfo-2.4-1.el7.centos.x86_64.rpm
https://drive.google.com/file/d/0BxoNLVWxzOJWVzBic043YUk2LWM/edit?usp=drive_web
​​
 gperftools-2.4-1.el7.centos.x86_64.rpm
https://drive.google.com/file/d/0BxoNLVWxzOJWNm81QWdQYU9ZaG8/edit?usp=drive_web
​

2015-06-17 8:01 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com:

 Hi,
 I finally fix it with tcmalloc with

 TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 LD_PRELOAD} =
 /usr/lib/libtcmalloc_minimal.so.4 qemu

 I got almost same result than jemmaloc in this case, maybe a littleb it
 faster


 Here the iops results for 1qemu vm with iothread by disk (iodepth=32,
 4krandread, nocache)


 qemu randread 4k nocache libc6  iops


 1 disk  29052
 2 disks 55878
 4 disks 127899
 8 disks 240566
 15 disks269976

 qemu randread 4k nocache jemmaloc   iops

 1 disk   41278
 2 disks  75781
 4 disks  195351
 8 disks  294241
 15 disks 298199



 qemu randread 4k nocache tcmalloc 16M cache iops


 1 disk   37911
 2 disks  67698
 4 disks  41076
 8 disks  43312
 15 disks 37569


 qemu randread 4k nocache tcmalloc patched 256M  iops

 1 disk no-iothread
 1 disk   42160
 2 disks  83135
 4 disks  194591
 8 disks  306038
 15 disks 302278


 - Mail original -
 De: aderumier aderum...@odiso.com
 À: Mark Nelson mnel...@redhat.com
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mardi 16 Juin 2015 20:27:54
 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

 I forgot to ask, is this with the patched version of tcmalloc that
 theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?

 Yes, the patched version of tcmalloc, but also the last version from
 gperftools git.
 (I'm talking about qemu here, not osds).

 I have tried to increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it
 doesn't help.



 For osd, increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES is helping.
 (Benchs are still running, I try to overload them as much as possible)



 - Mail original -
 De: Mark Nelson mnel...@redhat.com
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mardi 16 Juin 2015 19:04:27
 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

 I forgot to ask, is this with the patched version of tcmalloc that
 theoretically fixes the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES issue?

 Mark

 On 06/16/2015 11:46 AM, Mark Nelson wrote:
  Hi Alexandre,
 
  Excellent find! Have you also informed the QEMU developers of your
  discovery?
 
  Mark
 
  On 06/16/2015 11:38 AM, Alexandre DERUMIER wrote:
  Hi,
 
  some news about qemu with tcmalloc vs jemmaloc.
 
  I'm testing with multiple disks (with iothreads) in 1 qemu guest.
 
  And if tcmalloc is a little faster than jemmaloc,
 
  I have hit a lot of time the
  tcmalloc::ThreadCache::ReleaseToCentralCache bug.
 
  increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.
 
 
  with multiple disk, I'm around 200k iops with tcmalloc (before hitting
  the bug) and 350kiops with jemmaloc.
 
  The problem is that when I hit malloc bug, I'm around 4000-1 iops,
  and only way to fix is is to restart qemu ...
 
 
 
  - Mail original -
  De: pushpesh sharma pushpesh@gmail.com
  À: aderumier aderum...@odiso.com
  Cc: Somnath Roy somnath@sandisk.com, Irek Fasikhov
  malm...@gmail.com, ceph-devel ceph-de...@vger.kernel.org,
  ceph-users ceph-users@lists.ceph.com
  Envoyé: Vendredi 12 Juin 2015 08:58:21
  Objet: Re: rbd_cache, limiting read on high iops around 40k
 
  Thanks, posted the question in openstack list. Hopefully will get some
  expert opinion.
 
  On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER
  aderum...@odiso.com wrote:
  Hi,
 
  here a libvirt xml sample from libvirt src
 
  (you need to define iothreads number, then assign then in disks).
 
  I don't use openstack, so I really don't known how it's working with
 it.
 
 
  domain type='qemu'
  nameQEMUGuest1/name
  uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid
  memory unit='KiB'219136/memory
  currentMemory unit='KiB'219136/currentMemory
  vcpu placement='static'2/vcpu
  iothreads2/iothreads
  os
  type arch='i686' machine='pc'hvm/type
  boot dev='hd'/
  /os
  clock offset='utc'/
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashdestroy/on_crash
  devices
  emulator/usr/bin/qemu/emulator
  disk type='file' device='disk'
  driver name='qemu' type='raw' iothread='1'/
  source 

[ceph-users] 10d

2015-06-17 Thread Dan van der Ster
Hi,

After upgrading to 0.94.2 yesterday on our test cluster, we've had 3
PGs go inconsistent.

First, immediately after we updated the OSDs PG 34.10d went inconsistent:

2015-06-16 13:42:19.086170 osd.52 137.138.39.211:6806/926964 2 :
cluster [ERR] 34.10d scrub stat mismatch, got 4/5 objects, 0/0 clones,
0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 136/136
bytes,0/0 hit_set_archive bytes.

Second, an hour later 55.10d went inconsistent:

2015-06-16 14:27:58.336550 osd.303 128.142.23.56:6812/879385 10 :
cluster [ERR] 55.10d deep-scrub stat mismatch, got 0/1 objects, 0/0
clones, 0/1 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 0/0
bytes,0/0 hit_set_archive bytes.

Then last night 36.10d suffered the same fate:

2015-06-16 23:05:17.857433 osd.30 188.184.18.39:6800/2260103 16 :
cluster [ERR] 36.10d deep-scrub stat mismatch, got 5833/5834 objects,
0/0 clones, 5758/5759 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0
whiteouts, 24126649216/24130843520 bytes,0/0 hit_set_archive bytes.


In all cases, one object is missing. In all cases, the PG id is 10d.
Is this an epic coincidence or could something else going on here?

Best Regards,

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD LifeTime for Monitors

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 10:18 AM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
 Hi,

 Does anybody know how many data gets written from the monitors? I was using 
 some cheaper ssds for monitors and was wondering why they had already written 
 80 TB after 8 month.

3.8MB/s? That's a little more than I would naively expect, but LevelDB
is probably doubling the total data written (at least), so that brings
it down to 1.9MB/s of real data. How big is your PGMap and OSDMap? Do
you have any logging being written to those SSDs?
Etc. ;)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 10d

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 8:56 AM, Dan van der Ster d...@vanderster.com wrote:
 Hi,

 After upgrading to 0.94.2 yesterday on our test cluster, we've had 3
 PGs go inconsistent.

 First, immediately after we updated the OSDs PG 34.10d went inconsistent:

 2015-06-16 13:42:19.086170 osd.52 137.138.39.211:6806/926964 2 :
 cluster [ERR] 34.10d scrub stat mismatch, got 4/5 objects, 0/0 clones,
 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 136/136
 bytes,0/0 hit_set_archive bytes.

 Second, an hour later 55.10d went inconsistent:

 2015-06-16 14:27:58.336550 osd.303 128.142.23.56:6812/879385 10 :
 cluster [ERR] 55.10d deep-scrub stat mismatch, got 0/1 objects, 0/0
 clones, 0/1 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 0/0
 bytes,0/0 hit_set_archive bytes.

 Then last night 36.10d suffered the same fate:

 2015-06-16 23:05:17.857433 osd.30 188.184.18.39:6800/2260103 16 :
 cluster [ERR] 36.10d deep-scrub stat mismatch, got 5833/5834 objects,
 0/0 clones, 5758/5759 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0
 whiteouts, 24126649216/24130843520 bytes,0/0 hit_set_archive bytes.


 In all cases, one object is missing. In all cases, the PG id is 10d.
 Is this an epic coincidence or could something else going on here?

I'm betting on something else. What OSDs is each PG mapped to?
It looks like each of them is missing one object on some of the OSDs,
what are the objects?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Abhishek L
On Wed, Jun 17, 2015 at 1:02 PM, Nathan Cutler ncut...@suse.cz wrote:
 We've since merged something
 that stripes over several small xattrs so that we can keep things inline,
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

 Hi Sage:

 You wrote yet - should we earmark it for hammer backport?

I'm guessing https://github.com/ceph/ceph/pull/4973 is the backport for hammer
(issue http://tracker.ceph.com/issues/11981)

Regards
Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD LifeTime for Monitors

2015-06-17 Thread Stefan Priebe - Profihost AG
Hi,

Does anybody know how many data gets written from the monitors? I was using 
some cheaper ssds for monitors and was wondering why they had already written 
80 TB after 8 month.

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Jacek Jarosiewicz

Hi,

We've been doing some testing of ceph hammer (0.94.2), but the 
performance is very slow and we can't find what's causing the problem.


Initially we've started with four nodes with 10 osd's total.
The drives we've used were SATA enterprise drives and on top of that 
we've used SSD drives as flashcache devices for SATA drives and for 
storing OSD's journal.


The local tests on each of the four nodes are giving the results you'd 
expect:


~500MB/s seq writes and reads from SSD's,
~40k iops random reads from SSD's,
~200MB/s seq writes and reads from SATA drives
~600 iops random reads from SATA drives

..but when we've tested this setup from a client we got rather slow 
results.. so we've tried to find a bottleneck and tested the network by 
connecting client to our nodes via NFS - and performance via NFS is as 
expected (similar results to local tests, only slightly slower).


So we've reconfigured ceph to not use SATA drives and just setup OSD's 
on SSD drives (we wanted to test if maybe this is a flashcache problem?)


..but to no success, the results of rbd i/o tests from two osd nodes 
setup on SSD drives are like this:


~60MB/s seq writes
~100MB/s seq reads
~2-3k iops random reads

The client is an rbd mounted on a linux ubuntu box. All the servers (osd 
nodes and the client) are running Ubuntu Server 14.04. We tried to 
switch to CentOS 7 - but the results are the same.


Here are some technical details about our setup:

Four exact same osd nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008

Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD

Four monitors (one on each node). We do not use CephFS so we do not run 
ceph-mds.


During the tests we were monitoring all osd nodes and the client - we 
haven't seen any problems on none of the hosts - load was low, there 
were no cpu waits, no abnormal system interrupts, no i/o problems on the 
disks - all the systems seemed to not sweat at all and yet the results 
are rather dissatisfying.. we're kinda lost, any help will be appreciated.


Cheers,
J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 10d

2015-06-17 Thread Dan van der Ster
On Wed, Jun 17, 2015 at 10:52 AM, Gregory Farnum g...@gregs42.com wrote:
 On Wed, Jun 17, 2015 at 8:56 AM, Dan van der Ster d...@vanderster.com wrote:
 Hi,

 After upgrading to 0.94.2 yesterday on our test cluster, we've had 3
 PGs go inconsistent.

 First, immediately after we updated the OSDs PG 34.10d went inconsistent:

 2015-06-16 13:42:19.086170 osd.52 137.138.39.211:6806/926964 2 :
 cluster [ERR] 34.10d scrub stat mismatch, got 4/5 objects, 0/0 clones,
 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 136/136
 bytes,0/0 hit_set_archive bytes.

 Second, an hour later 55.10d went inconsistent:

 2015-06-16 14:27:58.336550 osd.303 128.142.23.56:6812/879385 10 :
 cluster [ERR] 55.10d deep-scrub stat mismatch, got 0/1 objects, 0/0
 clones, 0/1 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 0/0
 bytes,0/0 hit_set_archive bytes.

 Then last night 36.10d suffered the same fate:

 2015-06-16 23:05:17.857433 osd.30 188.184.18.39:6800/2260103 16 :
 cluster [ERR] 36.10d deep-scrub stat mismatch, got 5833/5834 objects,
 0/0 clones, 5758/5759 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0
 whiteouts, 24126649216/24130843520 bytes,0/0 hit_set_archive bytes.


 In all cases, one object is missing. In all cases, the PG id is 10d.
 Is this an epic coincidence or could something else going on here?

 I'm betting on something else. What OSDs is each PG mapped to?
 It looks like each of them is missing one object on some of the OSDs,
 what are the objects?

34.10d: [52,202,218]
55.10d: [303,231,65]
36.10d: [30,171,69]

So no common OSDs. I've already repaired all of these PGs, and logs
have nothing interesting, so I can't say more about the objects.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Mark Nelson

On 06/17/2015 04:10 AM, Jacek Jarosiewicz wrote:

Hi,

We've been doing some testing of ceph hammer (0.94.2), but the
performance is very slow and we can't find what's causing the problem.

Initially we've started with four nodes with 10 osd's total.
The drives we've used were SATA enterprise drives and on top of that
we've used SSD drives as flashcache devices for SATA drives and for
storing OSD's journal.

The local tests on each of the four nodes are giving the results you'd
expect:

~500MB/s seq writes and reads from SSD's,
~40k iops random reads from SSD's,
~200MB/s seq writes and reads from SATA drives
~600 iops random reads from SATA drives

..but when we've tested this setup from a client we got rather slow
results.. so we've tried to find a bottleneck and tested the network by
connecting client to our nodes via NFS - and performance via NFS is as
expected (similar results to local tests, only slightly slower).

So we've reconfigured ceph to not use SATA drives and just setup OSD's
on SSD drives (we wanted to test if maybe this is a flashcache problem?)

..but to no success, the results of rbd i/o tests from two osd nodes
setup on SSD drives are like this:

~60MB/s seq writes
~100MB/s seq reads
~2-3k iops random reads


Is this per SSD or aggregate?



The client is an rbd mounted on a linux ubuntu box. All the servers (osd
nodes and the client) are running Ubuntu Server 14.04. We tried to
switch to CentOS 7 - but the results are the same.


Is this kernel RBD or a VM using QEMU/KVM?  You might want to try fio 
with the librbd engine and see if you get the same results.  Also, 
radosbench isn't exactly analogous, but you might try some large 
sequential write / sequential read tests just as a sanity check.




Here are some technical details about our setup:

Four exact same osd nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008


Specs look fine.



Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD


Is that where the SSDs live?  I'm not a fan of such heavy expander 
over-subscription, but if you are getting good results outside of Ceph 
I'm guessing it's something else.




Four monitors (one on each node). We do not use CephFS so we do not run
ceph-mds.


You'll want to go down to 3 or up to 5.  Even numbers of monitors don't 
really help you in any way (and can actually hurt).  I'd suggest 3.




During the tests we were monitoring all osd nodes and the client - we
haven't seen any problems on none of the hosts - load was low, there
were no cpu waits, no abnormal system interrupts, no i/o problems on the
disks - all the systems seemed to not sweat at all and yet the results
are rather dissatisfying.. we're kinda lost, any help will be appreciated.


You didn't mention the brand/model of SSDs.  Especially for writes this 
is important as ceph journal writes are O_DSYNC.  Drives that have 
proper write loss protection often can ignore ATA_CMD_FLUSH and do these 
very quickly while other drives may need to flush to the flash cells. 
Also, keep in mind for writes that if you have journals on the SSDs and 
3X replication, you'll be doing 6 writes for every client write.


For reads and read IOPs on SSDs, you might try disabling in-memory 
logging and ceph authentication.  You might be interested in some 
testing we did on a variety of SSDs here:


http://www.spinics.net/lists/ceph-users/msg15733.html



Cheers,
J


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Alexandre DERUMIER
Hi,

can you post your ceph.conf ?

Which tools do you use for benchmark ? 
which block size, iodepth, number of client/rbd volume do you use ?


Is it with krbd kernel driver ?
(I have seen some bad performance with kernel 3.16, but at much higher rate 
(100k iops)
Is it with ethernet switches ? or ip over infiniband ?

your results seem quite low anyway.

I'm also using ethernet mellanox switchs (10GBE), sas3008 (dell r630).
and I can reach around 250kiops randread 4K with 1osd (with 80% usage of 
2x10cores 3,1ghz)



here my ceph.conf
-
[global]
fsid = 
public_network =
mon_initial_members = ...
mon_host =.
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
osd_pool_default_min_size = 1
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_journaler = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
osd_op_threads = 5
filestore_op_threads = 4
osd_op_num_threads_per_shard = 2
osd_op_num_shards = 10
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_nocrc = true
ms_dispatch_throttle_bytes = 0
cephx_sign_messages = false
cephx_require_signatures = false
throttler_perf_counter = false
ms_crc_header = false
ms_crc_data = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false


(main boost are disable cephx_auth, debug, and increase thread sharding)




- Mail original -
De: Jacek Jarosiewicz jjarosiew...@supermedia.pl
À: ceph-users ceph-users@lists.ceph.com
Envoyé: Mercredi 17 Juin 2015 11:10:26
Objet: [ceph-users] rbd performance issue - can't find bottleneck

Hi, 

We've been doing some testing of ceph hammer (0.94.2), but the 
performance is very slow and we can't find what's causing the problem. 

Initially we've started with four nodes with 10 osd's total. 
The drives we've used were SATA enterprise drives and on top of that 
we've used SSD drives as flashcache devices for SATA drives and for 
storing OSD's journal. 

The local tests on each of the four nodes are giving the results you'd 
expect: 

~500MB/s seq writes and reads from SSD's, 
~40k iops random reads from SSD's, 
~200MB/s seq writes and reads from SATA drives 
~600 iops random reads from SATA drives 

..but when we've tested this setup from a client we got rather slow 
results.. so we've tried to find a bottleneck and tested the network by 
connecting client to our nodes via NFS - and performance via NFS is as 
expected (similar results to local tests, only slightly slower). 

So we've reconfigured ceph to not use SATA drives and just setup OSD's 
on SSD drives (we wanted to test if maybe this is a flashcache problem?) 

..but to no success, the results of rbd i/o tests from two osd nodes 
setup on SSD drives are like this: 

~60MB/s seq writes 
~100MB/s seq reads 
~2-3k iops random reads 

The client is an rbd mounted on a linux ubuntu box. All the servers (osd 
nodes and the client) are running Ubuntu Server 14.04. We tried to 
switch to CentOS 7 - but the results are the same. 

Here are some technical details about our setup: 

Four exact same osd nodes: 
E5-1630 CPU 
32 GB RAM 
Mellanox MT27520 56Gbps network cards 
SATA controller LSI Logic SAS3008 

Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD 

Four monitors (one on each node). We do not use CephFS so we do not run 
ceph-mds. 

During the tests we were monitoring all osd nodes and the client - we 
haven't seen any problems on none of the hosts - load was low, there 
were no cpu waits, no abnormal system interrupts, no i/o problems on the 
disks - all the systems seemed to not sweat at all and yet the results 
are rather dissatisfying.. we're kinda lost, any help will be appreciated. 

Cheers, 
J 

-- 
Jacek Jarosiewicz 
Administrator Systemów Informatycznych 


 
SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie 
ul. Senatorska 13/15, 00-075 Warszawa 
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego, 
nr KRS 029537; kapitał zakładowy 42.756.000 zł 
NIP: 957-05-49-503 
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa 


 
SUPERMEDIA - http://www.supermedia.pl 
dostep do internetu - hosting - kolokacja - lacza - telefonia 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Jacek Jarosiewicz

On 06/17/2015 03:38 PM, Alexandre DERUMIER wrote:

Hi,

can you post your ceph.conf ?



sure:

[global]
fsid = e96fdc70-4f9c-4c12-aae8-63dd7c64c876
mon initial members = cf01,cf02
mon host = 10.4.10.211,10.4.10.212
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
public network = 10.4.10.0/24
#cluster network = 192.168.10.0/24
osd journal size = 10240
#journal dio = false
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 512
osd pool default pgp num = 512
osd crush chooseleaf type = 1

[mon.cf01]
host = cf01
mon addr = 10.4.10.211:6789

[mon.cf02]
host = cf02
mon addr = 10.4.10.212:6789

[osd.0]
host = cf01

[osd.1]
host = cf02



Which tools do you use for benchmark ?
which block size, iodepth, number of client/rbd volume do you use ?



I use fio for random reads and dd for seq reads and writes.
Block size is 4k (fs on the osd is XFS). I used iodepths 1,4,16,32 - the 
more io in queue the worse performance. The results I posted in my 
message are from fio command run like this:


fio --name=randread --numjobs=1  --rw=randread --bs=4k --size=10G 
--filename=test10g --direct=1




Is it with krbd kernel driver ?
(I have seen some bad performance with kernel 3.16, but at much higher rate 
(100k iops)
Is it with ethernet switches ? or ip over infiniband ?



kernel driver, kernel version: 3.10.0-229.4.2.el7.x86_64 (last tests 
were on CentOS 7.1, when we used Ubuntu - kernel version was 
3.13.0-53-generic)


We use ethernet switches (mellanox msx1012). Switches are configured 
with MLAG and we use mellanox dual port 56Gbps cards with bond 
interfaces configured as round-robin.



your results seem quite low anyway.



yes.. :(


I'm also using ethernet mellanox switchs (10GBE), sas3008 (dell r630).
and I can reach around 250kiops randread 4K with 1osd (with 80% usage of 
2x10cores 3,1ghz)



here my ceph.conf
-
[global]
fsid = 
public_network =
mon_initial_members = ...
mon_host =.
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
osd_pool_default_min_size = 1
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_journaler = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
osd_op_threads = 5
filestore_op_threads = 4
osd_op_num_threads_per_shard = 2
osd_op_num_shards = 10
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_nocrc = true
ms_dispatch_throttle_bytes = 0
cephx_sign_messages = false
cephx_require_signatures = false
throttler_perf_counter = false
ms_crc_header = false
ms_crc_data = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false


(main boost are disable cephx_auth, debug, and increase thread sharding)


Will try Your suggested config and let You know, thanks!

J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Christian Balzer
On Wed, 17 Jun 2015 16:03:17 +0200 Jacek Jarosiewicz wrote:

 On 06/17/2015 03:34 PM, Mark Nelson wrote:
  On 06/17/2015 04:10 AM, Jacek Jarosiewicz wrote:
  Hi,
 
 
 [ cut ]
 
 
  ~60MB/s seq writes
  ~100MB/s seq reads
  ~2-3k iops random reads
 
  Is this per SSD or aggregate?
 
 aggregate (if I understand You correctly). This is what I see when I run 
 tests on client - a mapped and mounted rbd.
 
 
 
  The client is an rbd mounted on a linux ubuntu box. All the servers
  (osd nodes and the client) are running Ubuntu Server 14.04. We tried
  to switch to CentOS 7 - but the results are the same.
 
  Is this kernel RBD or a VM using QEMU/KVM?  You might want to try fio
  with the librbd engine and see if you get the same results.  Also,
  radosbench isn't exactly analogous, but you might try some large
  sequential write / sequential read tests just as a sanity check.
 
 
 This is kernel rbd - testing performance on vm's will be the next step.
 I've tried fio with librbd, but the results were similar.
 I'll run ther radosbench tests and post my results.
 
Kernel tends to be less then stellar, but probably not your main problem.

 
  Here are some technical details about our setup:
 
  Four exact same osd nodes:
  E5-1630 CPU
  32 GB RAM
  Mellanox MT27520 56Gbps network cards
  SATA controller LSI Logic SAS3008
 
  Specs look fine.
 
 
  Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD
 
  Is that where the SSDs live?  I'm not a fan of such heavy expander
  over-subscription, but if you are getting good results outside of Ceph
  I'm guessing it's something else.
 
 
 No, the SSD's are connected to the integrated intel sata controller 
 (C610/X99)
 
 The only disks that reside in the SuperMicro chasis are the SATA drives. 
 And on the last tests I don't use them - the results I gave are on SSD's 
 only (one SSD serves as OSD and the journal is on another SSD).
 
 
  Four monitors (one on each node). We do not use CephFS so we do not
  run ceph-mds.
 
  You'll want to go down to 3 or up to 5.  Even numbers of monitors don't
  really help you in any way (and can actually hurt).  I'd suggest 3.
 
 
 OK, will do that, thanks!
 
 
  You didn't mention the brand/model of SSDs.  Especially for writes this
  is important as ceph journal writes are O_DSYNC.  Drives that have
  proper write loss protection often can ignore ATA_CMD_FLUSH and do
  these very quickly while other drives may need to flush to the flash
  cells. Also, keep in mind for writes that if you have journals on the
  SSDs and 3X replication, you'll be doing 6 writes for every client
  write.
 
 
 SSD's are INTEL SSDSC2BW240A4

Intel, they make great SSDs. 
And horrid product numbers in SMART to go with their differently
marketed/named devices.

Amyway, those are likely your problem, see:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-November/035695.html

Or any google result with Ceph intel 530 probably.

When you run those tests, did you use atop or iostat to watch the SSD
utilization?

Christian

 The rbd pool is set to have min_size 1 and size 2.
 
  For reads and read IOPs on SSDs, you might try disabling in-memory
  logging and ceph authentication.  You might be interested in some
  testing we did on a variety of SSDs here:
 
  http://www.spinics.net/lists/ceph-users/msg15733.html
 
 
 Will read up on that too, thanks!
 
 J
 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Accessing Ceph from Spark

2015-06-17 Thread Milan Sladky
Is it possible to access Ceph from Spark as it is mentioned here for Openstack 
Swift?
https://spark.apache.org/docs/latest/storage-openstack-swift.html
Thanks for help.
Milan Sladky  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Jacek Jarosiewicz

On 06/17/2015 03:34 PM, Mark Nelson wrote:

On 06/17/2015 04:10 AM, Jacek Jarosiewicz wrote:

Hi,



[ cut ]



~60MB/s seq writes
~100MB/s seq reads
~2-3k iops random reads


Is this per SSD or aggregate?


aggregate (if I understand You correctly). This is what I see when I run 
tests on client - a mapped and mounted rbd.






The client is an rbd mounted on a linux ubuntu box. All the servers (osd
nodes and the client) are running Ubuntu Server 14.04. We tried to
switch to CentOS 7 - but the results are the same.


Is this kernel RBD or a VM using QEMU/KVM?  You might want to try fio
with the librbd engine and see if you get the same results.  Also,
radosbench isn't exactly analogous, but you might try some large
sequential write / sequential read tests just as a sanity check.



This is kernel rbd - testing performance on vm's will be the next step.
I've tried fio with librbd, but the results were similar.
I'll run ther radosbench tests and post my results.



Here are some technical details about our setup:

Four exact same osd nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008


Specs look fine.



Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD


Is that where the SSDs live?  I'm not a fan of such heavy expander
over-subscription, but if you are getting good results outside of Ceph
I'm guessing it's something else.



No, the SSD's are connected to the integrated intel sata controller 
(C610/X99)


The only disks that reside in the SuperMicro chasis are the SATA drives. 
And on the last tests I don't use them - the results I gave are on SSD's 
only (one SSD serves as OSD and the journal is on another SSD).




Four monitors (one on each node). We do not use CephFS so we do not run
ceph-mds.


You'll want to go down to 3 or up to 5.  Even numbers of monitors don't
really help you in any way (and can actually hurt).  I'd suggest 3.



OK, will do that, thanks!



You didn't mention the brand/model of SSDs.  Especially for writes this
is important as ceph journal writes are O_DSYNC.  Drives that have
proper write loss protection often can ignore ATA_CMD_FLUSH and do these
very quickly while other drives may need to flush to the flash cells.
Also, keep in mind for writes that if you have journals on the SSDs and
3X replication, you'll be doing 6 writes for every client write.



SSD's are INTEL SSDSC2BW240A4
The rbd pool is set to have min_size 1 and size 2.


For reads and read IOPs on SSDs, you might try disabling in-memory
logging and ceph authentication.  You might be interested in some
testing we did on a variety of SSDs here:

http://www.spinics.net/lists/ceph-users/msg15733.html



Will read up on that too, thanks!

J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Sage Weil
On Wed, 17 Jun 2015, Nathan Cutler wrote:
  We've since merged something 
  that stripes over several small xattrs so that we can keep things inline, 
  but it hasn't been backported to hammer yet.  See
  c6cdb4081e366f471b372102905a1192910ab2da.
 
 Hi Sage:
 
 You wrote yet - should we earmark it for hammer backport?

Yes, please!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Accessing Ceph from Spark

2015-06-17 Thread Gregory Farnum
On Wed, Jun 17, 2015 at 2:58 PM, Milan Sladky milan.sla...@outlook.com wrote:
 Is it possible to access Ceph from Spark as it is mentioned here for
 Openstack Swift?

 https://spark.apache.org/docs/latest/storage-openstack-swift.html

Depends on what you're trying to do. It's possible that the Swift
bindings described there will just work with Ceph (somebody else will
have to answer that). If you're interested in CephFS, it has bindings
for Hadoop and I believe Spark works with that.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd performance issue - can't find bottleneck

2015-06-17 Thread Mark Nelson



On 06/17/2015 09:03 AM, Jacek Jarosiewicz wrote:

On 06/17/2015 03:34 PM, Mark Nelson wrote:

On 06/17/2015 04:10 AM, Jacek Jarosiewicz wrote:

Hi,



[ cut ]



~60MB/s seq writes
~100MB/s seq reads
~2-3k iops random reads


Is this per SSD or aggregate?


aggregate (if I understand You correctly). This is what I see when I run
tests on client - a mapped and mounted rbd.





The client is an rbd mounted on a linux ubuntu box. All the servers (osd
nodes and the client) are running Ubuntu Server 14.04. We tried to
switch to CentOS 7 - but the results are the same.


Is this kernel RBD or a VM using QEMU/KVM?  You might want to try fio
with the librbd engine and see if you get the same results.  Also,
radosbench isn't exactly analogous, but you might try some large
sequential write / sequential read tests just as a sanity check.



This is kernel rbd - testing performance on vm's will be the next step.
I've tried fio with librbd, but the results were similar.
I'll run ther radosbench tests and post my results.



Here are some technical details about our setup:

Four exact same osd nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008


Specs look fine.



Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD


Is that where the SSDs live?  I'm not a fan of such heavy expander
over-subscription, but if you are getting good results outside of Ceph
I'm guessing it's something else.



No, the SSD's are connected to the integrated intel sata controller
(C610/X99)

The only disks that reside in the SuperMicro chasis are the SATA drives.
And on the last tests I don't use them - the results I gave are on SSD's
only (one SSD serves as OSD and the journal is on another SSD).



Four monitors (one on each node). We do not use CephFS so we do not run
ceph-mds.


You'll want to go down to 3 or up to 5.  Even numbers of monitors don't
really help you in any way (and can actually hurt).  I'd suggest 3.



OK, will do that, thanks!



You didn't mention the brand/model of SSDs.  Especially for writes this
is important as ceph journal writes are O_DSYNC.  Drives that have
proper write loss protection often can ignore ATA_CMD_FLUSH and do these
very quickly while other drives may need to flush to the flash cells.
Also, keep in mind for writes that if you have journals on the SSDs and
3X replication, you'll be doing 6 writes for every client write.



SSD's are INTEL SSDSC2BW240A4


Ah, if I'm not mistaken that's the Intel 530 right?  You'll want to see 
this thread by Stefan Priebe:


https://www.mail-archive.com/ceph-users@lists.ceph.com/msg05667.html

In fact it was the difference in Intel 520 and Intel 530 performance 
that triggered many of the different investigations that have taken 
place by various folks into SSD flushing behavior on ATA_CMD_FLUSH.  The 
gist of it is that the 520 is very fast but probably not safe.  The 530 
is safe but not fast.  The DC S3700 (and similar drives with super 
capacitors) are thought to be both fast and safe (though some drives 
like the crucual M500 and later misrepresented their power loss 
protection so you have to be very careful!)



The rbd pool is set to have min_size 1 and size 2.


For reads and read IOPs on SSDs, you might try disabling in-memory
logging and ceph authentication.  You might be interested in some
testing we did on a variety of SSDs here:

http://www.spinics.net/lists/ceph-users/msg15733.html



Will read up on that too, thanks!

J


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Coded Pools and PGs

2015-06-17 Thread Loic Dachary
Hi,

On 17/06/2015 18:04, Garg, Pankaj wrote:
 Hi,
 
  
 
 I have 5 OSD servers, with total of 45 OSDS in my clusters. I am trying out 
 Erasure Coding with different K and m values.
 
 I seem to always get Warnings about : Degraded and Undersized PGs, whenever I 
 create a profile and create a Pool based on that profile.
 
 I have profiles with K and M value pairs : (2,1), (3,3) and (5,3).

By default the crush ruleset for an erasure coded pool needs as many hosts as 
k+m. I.e. you need 6 hosts for (3,3) and 8 for (5,3). You can change this by 
setting the failure domain when creating the erasure code profile as documented 
at

http://docs.ceph.com/docs/master/rados/operations/erasure-code-jerasure/

 
 What would be appropriate PG values? I have tried from as low as 12 to 1024 
 and always get the Degraded and Undersized PGs. This is quite confusing.
 
  

If the problem is different, it would be great if you could file a bug report 
with details. The ceph report command will output all the relevant information.

Cheers

 
 Thanks
 
 Pankaj
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Accessing Ceph from Spark

2015-06-17 Thread ZHOU Yuan
Hi Milan,

We've done some tests here and our hadoop can talk to RGW successfully
with this SwiftFS plugin. But we haven't tried Spark yet. One thing is
the data locality feature, it actually requires some special
configuration of Swift proxy-server, so RGW is not able to archive the
data locality there.

Could you please kindly share some deployment consideration of running
Spark on Swift/Ceph? Tachyon seems more promising...


Sincerely, Yuan


On Wed, Jun 17, 2015 at 9:58 PM, Milan Sladky milan.sla...@outlook.com wrote:
 Is it possible to access Ceph from Spark as it is mentioned here for
 Openstack Swift?

 https://spark.apache.org/docs/latest/storage-openstack-swift.html

 Thanks for help.

 Milan Sladky

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure Coded Pools and PGs

2015-06-17 Thread Garg, Pankaj
Hi,

I have 5 OSD servers, with total of 45 OSDS in my clusters. I am trying out 
Erasure Coding with different K and m values.
I seem to always get Warnings about : Degraded and Undersized PGs, whenever I 
create a profile and create a Pool based on that profile.
I have profiles with K and M value pairs : (2,1), (3,3) and (5,3).
What would be appropriate PG values? I have tried from as low as 12 to 1024 and 
always get the Degraded and Undersized PGs. This is quite confusing.

Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com