date:20160707

[ceph-users] 5 pgs of 712 stuck in active+remapped

2016-07-07 Thread Nathanial Byrnes


Hello,
I've got a Jewel Cluster (3 nodes, 15 OSD's) running with bobtail 
tunables (my xenserver cluster uses 3.10 as the kernel and there's no 
upgrading that). I started the cluster out on Hammer, upgraded to 
Jewel, discovered that optimal tunables would not work, and then set the 
tunables back to bobtail. Once the re-balancing completed, I was stuck 
with 1 pg in active+remapped. Repair didn't fix the pg.  I then upped 
the number of pgs from 328 to 712 (oddly I asked for 512, but ended p 
with 712...), now I have 5 pgs stuck in active+remapped. I also tried 
re-weighting the pgs a couple times, but no change Here is my osd tree:


ID WEIGHT   TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 15.0 root default
-2  5.0 host ceph1
 0  1.0 osd.0   up  0.95001  1.0
 1  1.0 osd.1   up  1.0  1.0
 2  1.0 osd.2   up  1.0  1.0
 3  1.0 osd.3   up  0.90002  1.0
 4  1.0 osd.4   up  1.0  1.0
-3  5.0 host ceph3
10  1.0 osd.10  up  1.0  1.0
11  1.0 osd.11  up  1.0  1.0
12  1.0 osd.12  up  1.0  1.0
13  1.0 osd.13  up  1.0  1.0
14  1.0 osd.14  up  1.0  1.0
-4  5.0 host ceph2
 5  1.0 osd.5   up  1.0  1.0
 6  1.0 osd.6   up  1.0  1.0
 7  1.0 osd.7   up  1.0  1.0
 8  1.0 osd.8   up  1.0  1.0
 9  1.0 osd.9   up  1.0  1.0


Any suggestions on how to troubleshoot or repair this?

Thanks and Regards,
Nate



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD Watch Notify for snapshots

2016-07-07 Thread Jason Dillaman

librbd pseudo-automatically handles this by flushing the cache to the
snapshot when a new snapshot is created, but I don't think krbd does the
same. If it doesn't, it would probably be a nice addition to the block
driver to support the general case.

Baring that (or if you want to involve something like fsfreeze), I think
the answer depends on how much you are willing to write some custom C/C++
code (I don't think the rados python library exposes watch/notify APIs). A
daemon could register a watch on a custom per-host/image/etc object which
would sync the disk when a notification is received. Prior to creating a
snapshot, you would need to send a notification to this object to alert the
daemon to sync/fsfreeze/etc.

On Thu, Jul 7, 2016 at 12:33 PM, Nick Fisk  wrote:

> Hi All,
>
> I have a RBD mounted to a machine via the kernel client and I wish to be
> able to take a snapshot and mount it to another machine
> where it can be backed up.
>
> The big issue is that I need to make sure that the process writing on the
> source machine is finished and the FS is sync'd before
> taking the snapshot.
>
> My question. Is there something I can do with Watch/Notify to trigger this
> checking/sync process on the source machine before the
> snapshot is actually taken?
>
> Thanks,
> Nick
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Gaurav Goyal

Thanks for the verification!

Yeah i didnt find additional section for [ceph] in my cinder.conf file.
Should i create that manually?
As i didnt find [ceph] section so i modified same parameters in [DEFAULT]
section.
I will change that as per your suggestion.

Moreoevr checking some other links i got to know that, i must configure
following additional parameters
should i do that and install tgtadm package?

rootwrap_config = /etc/cinder/rootwrap.conf
api_paste_confg = /etc/cinder/api-paste.ini
iscsi_helper = tgtadm
volume_name_template = volume-%s
volume_group = cinder-volumes

Do i need to execute following commands?

"pvcreate /dev/rbd1" &
"vgcreate cinder-volumes /dev/rbd1"


Regards

Gaurav Goyal



On Thu, Jul 7, 2016 at 10:02 PM, Jason Dillaman  wrote:

> These lines from your log output indicates you are configured to use LVM
> as a cinder backend.
>
> > 2016-07-07 16:20:31.966 32549 INFO cinder.volume.manager
> [req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Starting volume
> driver LVMVolumeDriver (3.0.0)
> > 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Command: sudo
> cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o
> name cinder-volumes
>
> Looking at your provided configuration, I don't see a "[ceph]"
> configuration section. Here is a configuration example [1] for Cinder.
>
> [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-cinder
>
> On Thu, Jul 7, 2016 at 9:35 PM, Gaurav Goyal 
> wrote:
>
>> Hi Kees/Fran,
>>
>>
>> Do you find any issue in my cinder.conf file?
>>
>> it says Volume group "cinder-volumes" not found. When to configure this
>> volume group?
>>
>> I have done ceph configuration for nova creation.
>> But i am still facing the same error .
>>
>>
>>
>> */var/log/cinder/volume.log*
>>
>> 2016-07-07 16:20:13.765 136259 ERROR cinder.service [-] Manager for
>> service cinder-volume OSKVM1@ceph is reporting problems, not sending
>> heartbeat. Service will appear "down".
>>
>> 2016-07-07 16:20:23.770 136259 ERROR cinder.service [-] Manager for
>> service cinder-volume OSKVM1@ceph is reporting problems, not sending
>> heartbeat. Service will appear "down".
>>
>> 2016-07-07 16:20:30.789 136259 WARNING oslo_messaging.server [-]
>> start/stop/wait must be called in the same thread
>>
>> 2016-07-07 16:20:30.791 136259 WARNING oslo_messaging.server
>> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] start/stop/wait must
>> be called in the same thread
>>
>> 2016-07-07 16:20:30.794 136247 INFO oslo_service.service
>> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Caught SIGTERM,
>> stopping children
>>
>> 2016-07-07 16:20:30.799 136247 INFO oslo_service.service
>> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Waiting on 1 children
>> to exit
>>
>> 2016-07-07 16:20:30.806 136247 INFO oslo_service.service
>> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Child 136259 killed by
>> signal 15
>>
>> 2016-07-07 16:20:31.950 32537 INFO cinder.volume.manager
>> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Determined volume DB
>> was not empty at startup.
>>
>> 2016-07-07 16:20:31.956 32537 INFO cinder.volume.manager
>> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Image-volume cache
>> disabled for host OSKVM1@ceph.
>>
>> 2016-07-07 16:20:31.957 32537 INFO oslo_service.service
>> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Starting 1 workers
>>
>> 2016-07-07 16:20:31.960 32537 INFO oslo_service.service
>> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Started child 32549
>>
>> 2016-07-07 16:20:31.963 32549 INFO cinder.service [-] Starting
>> cinder-volume node (version 7.0.1)
>>
>> 2016-07-07 16:20:31.966 32549 INFO cinder.volume.manager
>> [req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Starting volume driver
>> LVMVolumeDriver (3.0.0)
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
>> [req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Failed to initialize
>> driver.
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Traceback (most
>> recent call last):
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
>> "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 368, in
>> init_host
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
>> self.driver.check_for_setup_error()
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
>> "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in
>> wrapper
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager return
>> f(*args, **kwargs)
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
>> "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 269,
>> in check_for_setup_error
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
>> lvm_conf=lvm_conf_file)
>>
>> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
>>

Re: [ceph-users] (no subject)

2016-07-07 Thread Jason Dillaman

These lines from your log output indicates you are configured to use LVM as
a cinder backend.

> 2016-07-07 16:20:31.966 32549 INFO cinder.volume.manager
[req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Starting volume driver
LVMVolumeDriver (3.0.0)
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Command: sudo
cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o
name cinder-volumes

Looking at your provided configuration, I don't see a "[ceph]"
configuration section. Here is a configuration example [1] for Cinder.

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-cinder

On Thu, Jul 7, 2016 at 9:35 PM, Gaurav Goyal 
wrote:

> Hi Kees/Fran,
>
>
> Do you find any issue in my cinder.conf file?
>
> it says Volume group "cinder-volumes" not found. When to configure this
> volume group?
>
> I have done ceph configuration for nova creation.
> But i am still facing the same error .
>
>
>
> */var/log/cinder/volume.log*
>
> 2016-07-07 16:20:13.765 136259 ERROR cinder.service [-] Manager for
> service cinder-volume OSKVM1@ceph is reporting problems, not sending
> heartbeat. Service will appear "down".
>
> 2016-07-07 16:20:23.770 136259 ERROR cinder.service [-] Manager for
> service cinder-volume OSKVM1@ceph is reporting problems, not sending
> heartbeat. Service will appear "down".
>
> 2016-07-07 16:20:30.789 136259 WARNING oslo_messaging.server [-]
> start/stop/wait must be called in the same thread
>
> 2016-07-07 16:20:30.791 136259 WARNING oslo_messaging.server
> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] start/stop/wait must
> be called in the same thread
>
> 2016-07-07 16:20:30.794 136247 INFO oslo_service.service
> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Caught SIGTERM,
> stopping children
>
> 2016-07-07 16:20:30.799 136247 INFO oslo_service.service
> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Waiting on 1 children
> to exit
>
> 2016-07-07 16:20:30.806 136247 INFO oslo_service.service
> [req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Child 136259 killed by
> signal 15
>
> 2016-07-07 16:20:31.950 32537 INFO cinder.volume.manager
> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Determined volume DB
> was not empty at startup.
>
> 2016-07-07 16:20:31.956 32537 INFO cinder.volume.manager
> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Image-volume cache
> disabled for host OSKVM1@ceph.
>
> 2016-07-07 16:20:31.957 32537 INFO oslo_service.service
> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Starting 1 workers
>
> 2016-07-07 16:20:31.960 32537 INFO oslo_service.service
> [req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Started child 32549
>
> 2016-07-07 16:20:31.963 32549 INFO cinder.service [-] Starting
> cinder-volume node (version 7.0.1)
>
> 2016-07-07 16:20:31.966 32549 INFO cinder.volume.manager
> [req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Starting volume driver
> LVMVolumeDriver (3.0.0)
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
> [req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Failed to initialize
> driver.
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Traceback (most
> recent call last):
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 368, in
> init_host
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
> self.driver.check_for_setup_error()
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in
> wrapper
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager return
> f(*args, **kwargs)
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 269,
> in check_for_setup_error
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
> lvm_conf=lvm_conf_file)
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 86,
> in __init__
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager if
> self._vg_exists() is False:
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 123,
> in _vg_exists
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
> run_as_root=True)
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/cinder/utils.py", line 155, in execute
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager return
> processutils.execute(*cmd, **kwargs)
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
> "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line
> 275, in execute
>
> 2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
>

Re: [ceph-users] RBD - Deletion / Discard - IO Impact

2016-07-07 Thread Christian Balzer

On Thu, 7 Jul 2016 12:53:33 +0100 Nick Fisk wrote:

> Hi All,
> 
>  
> 
> Does anybody else see a massive (ie 10x) performance impact when either
> deleting a RBD or running something like mkfs.xfs against an existing
> RBD, which would zero/discard all blocks?
> 
>  
> 
> In the case of deleting a 4TB RBD, I'm seeing latency in some cases rise
> up to 10s.
> 
>  
> 
> It looks like it the XFS deletions on the OSD which are potentially
> responsible for the massive drop in performance as I see random OSD's in
> turn peak to 100% utilisation.
> 
>  
> 
> I'm not aware of any throttling than can be done to reduce this impact,
> but would be interested to here from anyone else that may experience
> this.
> 
I haven't tested this since firefly and found RBD deletions, discards and
snapshots all to be very expensive operations.

See also:
http://ceph.com/planet/use-discard-with-krbd-client-since-kernel-3-18/

I would think that the unified queue in Jewel would help with this.

But how much this is also an XFS amplification and thus not helped by
proper queuing above I can't tell, all my production OSDs are Ext4. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Gaurav Goyal

Hi Kees/Fran,


Do you find any issue in my cinder.conf file?

it says Volume group "cinder-volumes" not found. When to configure this
volume group?

I have done ceph configuration for nova creation.
But i am still facing the same error .



*/var/log/cinder/volume.log*

2016-07-07 16:20:13.765 136259 ERROR cinder.service [-] Manager for service
cinder-volume OSKVM1@ceph is reporting problems, not sending heartbeat.
Service will appear "down".

2016-07-07 16:20:23.770 136259 ERROR cinder.service [-] Manager for service
cinder-volume OSKVM1@ceph is reporting problems, not sending heartbeat.
Service will appear "down".

2016-07-07 16:20:30.789 136259 WARNING oslo_messaging.server [-]
start/stop/wait must be called in the same thread

2016-07-07 16:20:30.791 136259 WARNING oslo_messaging.server
[req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] start/stop/wait must
be called in the same thread

2016-07-07 16:20:30.794 136247 INFO oslo_service.service
[req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Caught SIGTERM,
stopping children

2016-07-07 16:20:30.799 136247 INFO oslo_service.service
[req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Waiting on 1 children
to exit

2016-07-07 16:20:30.806 136247 INFO oslo_service.service
[req-f62eb1bb-6883-457f-9f63-b5556342eca7 - - - - -] Child 136259 killed by
signal 15

2016-07-07 16:20:31.950 32537 INFO cinder.volume.manager
[req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Determined volume DB
was not empty at startup.

2016-07-07 16:20:31.956 32537 INFO cinder.volume.manager
[req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Image-volume cache
disabled for host OSKVM1@ceph.

2016-07-07 16:20:31.957 32537 INFO oslo_service.service
[req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Starting 1 workers

2016-07-07 16:20:31.960 32537 INFO oslo_service.service
[req-cef7baaa-b0ef-4365-89d9-4379eb1c104c - - - - -] Started child 32549

2016-07-07 16:20:31.963 32549 INFO cinder.service [-] Starting
cinder-volume node (version 7.0.1)

2016-07-07 16:20:31.966 32549 INFO cinder.volume.manager
[req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Starting volume driver
LVMVolumeDriver (3.0.0)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
[req-f9371a24-bb2b-42fb-ad4e-e2cfc271fe10 - - - - -] Failed to initialize
driver.

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Traceback (most
recent call last):

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 368, in
init_host

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
self.driver.check_for_setup_error()

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in
wrapper

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager return
f(*args, **kwargs)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/cinder/volume/drivers/lvm.py", line 269,
in check_for_setup_error

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
lvm_conf=lvm_conf_file)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 86,
in __init__

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager if
self._vg_exists() is False:

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/cinder/brick/local_dev/lvm.py", line 123,
in _vg_exists

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
run_as_root=True)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/cinder/utils.py", line 155, in execute

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager return
processutils.execute(*cmd, **kwargs)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager   File
"/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line
275, in execute

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
cmd=sanitized_cmd)

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager
ProcessExecutionError: Unexpected error while running command.

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Command: sudo
cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C vgs --noheadings -o
name cinder-volumes

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Exit code: 5

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Stdout: u''

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager Stderr: u'
Volume group "cinder-volumes" not found\n  Cannot process volume group
cinder-volumes\n'

2016-07-07 16:20:32.067 32549 ERROR cinder.volume.manager

2016-07-07 16:20:32.108 32549 INFO oslo.messaging._drivers.impl_rabbit
[req-7e229d1f-06af-4b60-8e15-1f8c0e6eb084 - - - - -] Connecting to AMQP
server on controller:5672

2016-07-07 16:20:32.125 32549 INFO oslo.messaging._drivers.impl_rabbit

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread Christian Balzer

Hello,

On Thu, 7 Jul 2016 23:19:35 +0200 Zoltan Arnold Nagy wrote:

> Hi Nick,
> 
> How large NVMe drives are you running per 12 disks?
> 
> In my current setup I have 4xP3700 per 36 disks but I feel like I could
> get by with 2… Just looking for community experience :-)
>
This is funny, because you ask Nick about the size and don't mention it
yourself. ^o^

As I speculated in my reply, it's the 400GB model and Nick didn't dispute
that.
And I shall assume the same for you.

You could get by with 2 of the 400GB ones, but that depends on a number of
things.

1. What's your use case, typical usage pattern?
Are you doing a lot of large sequential writes or is it mostly smallish
I/Os? 
HDD OSDs will clock in at about 100MB/s with OSD bench, but realistically
not see more than 50-60MB/s, so with 18 of them per one 400GB P3700 you're
about on par.

2. What's your network setup? If you have more than 20Gb/s to that node,
your journals will likely become the (write) bottleneck. 
But that's only the case with backfills or again largish sequential writes
of course.

3. A repeat of sorts of the previous 2 points, this time with the focus on
endurance. How much data are you writing per day to an average OSD?
With 18 OSDs per 400GB P3700 NVMe you will want that to be less than
223GB/day/OSD.

4. As usual, failure domains. In the case of a NVMe failure you'll loose
twice the amount of OSDs.

That all being said, at 36 OSDs I'd venture you'll run out of CPU steam
(with small write IOPS) before your journals become the bottleneck.

Christian

> Cheers,
> Zoltan
> 
[snip]

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph/daemon mon not working and status exit (1)

2016-07-07 Thread Rahul Talari

I am trying to use Ceph in Docker. I have built the ceph/base and
ceph/daemon DockeFiles. I am trying to deploy a Ceph monitor according to
the instructions given in the tutorial but when I execute the command
without KV store and type:

sudo docker ps

I am not able to keep the monitor up. What mistakes am I performing with
doing so? Is there something I should do to get it up and running
continuously without failing?

Thank you

-- 
Rahul Talari
*University of Illinois Urbana Champaign - **Computer Engineering **|
University of California Santa Cruz - Computer Science | Alpha Tau Omega |
SSRC - UCSC *
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Brad Hubbard

Hi Goncalo,

If possible it would be great if you could capture a core file for this with
full debugging symbols (preferably glibc debuginfo as well). How you do
that will depend on the ceph version and your OS but we can offfer help
if required I'm sure.

Once you have the core do the following.

$ gdb /path/to/ceph-fuse core.
(gdb) set pag off
(gdb) set log on
(gdb) thread apply all bt
(gdb) thread apply all bt full

Then quit gdb and you should find a file called gdb.txt in your
working directory.
If you could attach that file to http://tracker.ceph.com/issues/16610

Cheers,
Brad

On Fri, Jul 8, 2016 at 12:06 AM, Patrick Donnelly  wrote:
> On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
>  wrote:
>> Unfortunately, the other user application breaks ceph-fuse again (It is a
>> completely different application then in my previous test).
>>
>> We have tested it in 4 machines with 4 cores. The user is submitting 16
>> single core jobs which are all writing different output files (one per job)
>> to a common dir in cephfs. The first 4 jobs run happily and never break
>> ceph-fuse. But the remaining 12 jobs, running in the remaining 3 machines,
>> trigger a segmentation fault, which is completely different from the other
>> case.
>>
>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>> 1: (()+0x297fe2) [0x7f54402b7fe2]
>> 2: (()+0xf7e0) [0x7f543ecf77e0]
>> 3: (ObjectCacher::bh_write_scattered(std::list> std::allocator >&)+0x36) [0x7f5440268086]
>> 4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
>> std::chrono::time_point> std::chrono::duration > >, long*,
>> int*)+0x22c) [0x7f5440268a3c]
>> 5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
>> 6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
>> 7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
>> 8: (()+0x7aa1) [0x7f543ecefaa1]
>>  9: (clone()+0x6d) [0x7f543df6893d]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
>> interpret this.
>
> This one looks like a very different problem. I've created an issue
> here: http://tracker.ceph.com/issues/16610
>
> Thanks for the report and debug log!
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing to Activate new OSD ceph-deploy

2016-07-07 Thread Scottix

I played with it enough to make it work.

Basically i created the directory it was going to put the data in
mkdir /var/lib/ceph/osd/ceph-22

Then I ran ceph-deploy activate which then did a little bit more into
putting it in the cluster but it still didn't start because of permissions
with the journal.

Some of the permissions were set to ceph:ceph I tried the new permissions
but it failed to start, and after reading a mailing list a reboot may have
fixed that.
Anyway I ran chown -R root:root ceph-22 and after that is started.

I still need to fix permissions but I am happy I got it in atleast.

--Scott

On Thu, Jul 7, 2016 at 2:54 PM Scottix  wrote:

> Hey,
> This is the first time I have had a problem with ceph-deploy
>
> I have attached the log but I can't seem to activate the osd.
>
> I am running
> ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
>
> I did upgrade from Infernalis->Jewel
> I haven't changed ceph ownership but I do have the config option
> setuser_match_path = /var/lib/ceph/$type/$cluster-$id
>
> Any help would be appreciated,
> Scott
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Failing to Activate new OSD ceph-deploy

2016-07-07 Thread Scottix

Hey,
This is the first time I have had a problem with ceph-deploy

I have attached the log but I can't seem to activate the osd.

I am running
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

I did upgrade from Infernalis->Jewel
I haven't changed ceph ownership but I do have the config option
setuser_match_path = /var/lib/ceph/$type/$cluster-$id

Any help would be appreciated,
Scott
Stat200:~/t-cluster$ ceph-deploy --overwrite-conf osd create tCeph203:/dev/sdl:/dev/sdc4
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/t/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.34): /usr/bin/ceph-deploy --overwrite-conf osd create tCeph203:/dev/sdl:/dev/sdc4
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  disk  : [('tCeph203', '/dev/sdl', '/dev/sdc4')]
[ceph_deploy.cli][INFO  ]  dmcrypt   : False
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  bluestore : None
[ceph_deploy.cli][INFO  ]  overwrite_conf: True
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 
[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  fs_type   : xfs
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  zap_disk  : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks tCeph203:/dev/sdl:/dev/sdc4
t@tceph203's password: 
[tCeph203][DEBUG ] connection detected need for sudo
t@tceph203's password: 
[tCeph203][DEBUG ] connected to host: tCeph203 
[tCeph203][DEBUG ] detect platform information from remote host
[tCeph203][DEBUG ] detect machine type
[tCeph203][DEBUG ] find the location of an executable
[tCeph203][INFO  ] Running command: sudo /sbin/initctl version
[tCeph203][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to tCeph203
[tCeph203][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host tCeph203 disk /dev/sdl journal /dev/sdc4 activate True
[tCeph203][DEBUG ] find the location of an executable
[tCeph203][INFO  ] Running command: sudo /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdl /dev/sdc4
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is /sys/dev/block/8:176/dm/uuid
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is /sys/dev/block/8:176/dm/uuid
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is /sys/dev/block/8:176/dm/uuid
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is /sys/dev/block/8:176/dm/uuid
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[tCeph203][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdc4 uuid path is /sys/dev/block/8:36/dm/uuid
[tCeph203][WARNIN] prepare_device: Journal /dev/sdc4 is a partition
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdc4 uuid path is /sys/dev/block/8:36/dm/uuid
[tCeph203][WARNIN] prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
[tCeph203][WARNIN] command: Running command: /sbin/blkid -o udev -p /dev/sdc4
[tCeph203][WARNIN] prepare_device: Journal /dev/sdc4 was not prepared with ceph-disk. Symlinking directly.
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is /sys/dev/block/8:176/dm/uuid
[tCeph203][WARNIN] set_data_partition: Creating osd partition on /dev/sdl
[tCeph203][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdl uuid path is

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread Zoltan Arnold Nagy

Hi Nick,

How large NVMe drives are you running per 12 disks?

In my current setup I have 4xP3700 per 36 disks but I feel like I could get by 
with 2… Just looking for community experience :-)

Cheers,
Zoltan

> On 07 Jul 2016, at 10:45, Nick Fisk  wrote:
> 
> Just to add if you really want to go with lots of HDD's to Journals then go
> NVME. They are not a lot more expensive than the equivalent SATA based
> 3700's, but the latency is low low low. Here is an example of a node I have
> just commissioned with 12 HDD's to one P3700
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0.00 0.00   68.000.00  8210.00 0.00   241.47
> 0.263.853.850.00   2.09  14.20
> sdd   2.50 0.00  198.50   22.00 24938.00  9422.00   311.66
> 4.34   27.806.21  222.64   2.45  54.00
> sdc   0.00 0.00   63.000.00  7760.00 0.00   246.35
> 0.152.162.160.00   1.56   9.80
> sda   0.00 0.00   61.50   47.00  7600.00 22424.00   553.44
> 2.77   25.572.63   55.57   3.82  41.40
> nvme0n1   0.0022.502.00 2605.00 8.00 139638.00   107.13
> 0.140.050.000.05   0.03   6.60
> sdg   0.00 0.00   61.00   28.00  6230.00 12696.00   425.30
> 3.66   74.795.84  225.00   3.87  34.40
> sdf   0.00 0.00   34.50   47.00  4108.00 21702.00   633.37
> 3.56   43.751.51   74.77   2.85  23.20
> sdh   0.00 0.00   75.00   15.50  9180.00  4984.00   313.02
> 0.45   12.553.28   57.42   3.51  31.80
> sdi   1.50 0.50  142.00   48.50 18102.00 21924.00   420.22
> 3.60   18.924.99   59.71   2.70  51.40
> sdj   0.50 0.00   74.505.00  9362.00  1832.00   281.61
> 0.334.103.33   15.60   2.44  19.40
> sdk   0.00 0.00   54.000.00  6420.00 0.00   237.78
> 0.122.302.300.00   1.70   9.20
> sdl   0.00 0.00   21.001.50  2286.0016.00   204.62
> 0.32   18.13   13.81   78.67   6.67  15.00
> sde   0.00 0.00   98.000.00 12304.00 0.00   251.10
> 0.303.103.100.00   2.08  20.40
> 
> 50us latency at 2605 iops!!!
> 
> Compared to one of the other nodes with 2 100GB S3700's, 6 disks each
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.0030.500.00  894.50 0.00 50082.00   111.98
> 0.360.410.000.41   0.20  17.80
> sdb   0.00 9.000.00  551.00 0.00 32044.00   116.31
> 0.230.420.000.42   0.19  10.40
> sdc   0.00 2.006.50   17.50   278.00  8422.00   725.00
> 1.08   44.92   18.46   54.74   8.08  19.40
> sdd   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> sde   0.00 2.50   27.50   21.50  2112.00  9866.00   488.90
> 0.59   12.046.91   18.60   6.53  32.00
> sdf   0.50 0.00   50.500.00  6170.00 0.00   244.36
> 0.184.634.630.00   2.10  10.60
> md1   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> md0   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
> sdg   0.00 1.50   32.00  386.50  3970.00 12188.0077.22
> 0.150.350.500.34   0.15   6.40
> sdh   0.00 0.006.000.0034.00 0.0011.33
> 0.07   12.67   12.670.00  11.00   6.60
> sdi   0.00 0.501.50   19.50 6.00  8862.00   844.57
> 0.96   45.71   33.33   46.67   6.57  13.80
> sdj   0.00 0.00   67.000.00  8214.00 0.00   245.19
> 0.172.512.510.00   1.88  12.60
> sdk   1.50 2.50   61.00   48.00  6216.00 21020.00   499.74
> 2.01   18.46   11.41   27.42   5.05  55.00
> sdm   0.00 0.00   30.500.00  3576.00 0.00   234.49
> 0.072.432.430.00   1.90   5.80
> sdl   0.00 4.50   25.00   23.50  2092.00 12648.00   607.84
> 1.36   19.425.60   34.13   4.04  19.60
> sdn   0.50 0.00   23.000.00  2670.00 0.00   232.17
> 0.072.962.960.00   2.43   5.60
> 
> Pretty much 10x the latency. I'm seriously impressed with these NVME things.
> 
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Christian Balzer
>> Sent: 07 July 2016 03:23
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] multiple journals on SSD
>> 
>> 
>> Hello,
>> 
>> I have a multitude of of problems with the benchmarks and conclusions
> here,
>> more below.
>> 
>> But firstly to address the question of the OP, definitely not filesystem
> based
>>

[ceph-users] Ceph Social Media

2016-07-07 Thread Patrick McGarry

Hey cephers,

Just wanted to remind everyone that our Ceph social media channels are
for all upstream consumption. If you are doing something cool with
Ceph, have a new feature/integration to announce, or just some piece
of news that would be of-interest to the Ceph community, please send
it my way!

I'm happy to answer/RT anything that gets sent to Ceph channels
directly, or you can just email the information to me and I'll get it
out as soon as possible. What I wont do is simply advertise for
someone's product (which goes for Red Hat too), so make sure that it
has a technology/community angle and we'll be happy to promote it.
Current default promotion channels are:

Twitter (https://twitter.com/ceph)
Facebook (https://www.facebook.com/cephstorage)
Google+ (https://plus.google.com/communities/116115281445231966828)

Have had a couple of people ask me about it in recent history, so
wanted to reiterate for the broader community that these channels
exist to promote Ceph, not any single company (or group of companies)
within our ecosystem. Thanks!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] what's the meaning of 'removed_snaps' of `ceph osd pool ls detail`?

2016-07-07 Thread Gregory Farnum

On Thu, Jul 7, 2016 at 1:07 AM, 秀才  wrote:
> Hi,All:)
>
> i have made a cache-tier,
> but i do not know message 'removed_snaps
> [1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]'.
> i have not snapped any thing yet.

When you take snapshots, it generally creates a lot of tracking data
throughout the cluster that needs to get deleted.
When you remove a snapshot, that is propagated by the OSDMap via removed_snaps.
removed_snaps is an "interval set", not a simple list/array/vector (to
keep the size down), consisting of an offset and the number of deleted
IDs following.
A cache tier's removed_snaps has to match that of its backing pool,
for a whole bunch of consistency reasons. And you'll note that here it
does. ;)
-Greg

>
>
> ceph> osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 1 'volumes' replicated size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 11365 lfor 11365 flags
> hashpspool tiers 3 read_tier 3 write_tier 3 stripe_width 0
> removed_snaps
> [1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]
> pool 2 'test' replicated size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 100 pgp_num 100 last_change 2779 flags hashpspool
> stripe_width 0
> pool 3 'fast' replicated size 2 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 128 pgp_num 128 last_change 11376 flags
> hashpspool,incomplete_clones tier_of 1 cache_mode writeback target_bytes
> 1500 hit_set bloom{false_positive_probability: 0.05, target_size: 0,
> seed: 0} 3600s x1 min_read_recency_for_promote 1 stripe_width 0
> removed_snaps
> [1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]
>
> Regards,
> XiuCai.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD Watch Notify for snapshots

2016-07-07 Thread Nick Fisk

Hi All,

I have a RBD mounted to a machine via the kernel client and I wish to be able 
to take a snapshot and mount it to another machine
where it can be backed up.

The big issue is that I need to make sure that the process writing on the 
source machine is finished and the FS is sync'd before
taking the snapshot.

My question. Is there something I can do with Watch/Notify to trigger this 
checking/sync process on the source machine before the
snapshot is actually taken?

Thanks,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Gaurav Goyal

Hi Fran,

Here is my cinder.conf file. Please help to analyze it.

Do i need to create volume group as mentioned in this link
http://docs.openstack.org/liberty/install-guide-rdo/cinder-storage-install.html


[root@OSKVM1 ~]# grep -v "^#" /etc/cinder/cinder.conf|grep -v ^$

[DEFAULT]

rpc_backend = rabbit

auth_strategy = keystone

my_ip = 10.24.0.4

notification_driver = messagingv2

backup_ceph_conf = /etc/ceph/ceph.conf

backup_ceph_user = cinder-backup

backup_ceph_chunk_size = 134217728

backup_ceph_pool = backups

backup_ceph_stripe_unit = 0

backup_ceph_stripe_count = 0

restore_discard_excess_bytes = true

backup_driver = cinder.backup.drivers.ceph

glance_api_version = 2

enabled_backends = ceph

rbd_pool = volumes

rbd_user = cinder

rbd_ceph_conf = /etc/ceph/ceph.conf

rbd_flatten_volume_from_snapshot = false

rbd_secret_uuid = a536c85f-d660-4c25-a840-e321c09e7941

rbd_max_clone_depth = 5

rbd_store_chunk_size = 4

rados_connect_timeout = -1

volume_driver = cinder.volume.drivers.rbd.RBDDriver

[BRCD_FABRIC_EXAMPLE]

[CISCO_FABRIC_EXAMPLE]

[cors]

[cors.subdomain]

[database]

connection = mysql://cinder:cinder@controller/cinder

[fc-zone-manager]

[keymgr]

[keystone_authtoken]

auth_uri = http://controller:5000

auth_url = http://controller:35357

auth_plugin = password

project_domain_id = default

user_domain_id = default

project_name = service

username = cinder

password = cinder

[matchmaker_redis]

[matchmaker_ring]

[oslo_concurrency]

lock_path = /var/lib/cinder/tmp

[oslo_messaging_amqp]

[oslo_messaging_qpid]

[oslo_messaging_rabbit]

rabbit_host = controller

rabbit_userid = openstack

rabbit_password = 

[oslo_middleware]

[oslo_policy]

[oslo_reports]

[profiler]

On Thu, Jul 7, 2016 at 11:38 AM, Fran Barrera 
wrote:

> Hello,
>
> Are you configured these two paremeters in cinder.conf?
>
> rbd_user
> rbd_secret_uuid
>
> Regards.
>
> 2016-07-07 15:39 GMT+02:00 Gaurav Goyal :
>
>> Hello Mr. Kees,
>>
>> Thanks for your response!
>>
>> My setup is
>>
>> Openstack Node 1 -> controller + network + compute1 (Liberty Version)
>> Openstack node 2 --> Compute2
>>
>> Ceph version Hammer
>>
>> I am using dell storage with following status
>>
>> DELL SAN storage is attached to both hosts as
>>
>> [root@OSKVM1 ~]# iscsiadm -m node
>>
>> 10.35.0.3:3260,1
>> iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1
>>
>> 10.35.0.8:3260,1
>> iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1
>>
>> 10.35.0.*:3260,-1
>> iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2
>>
>> 10.35.0.8:3260,1
>> iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2
>>
>> 10.35.0.*:3260,-1
>> iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3
>>
>> 10.35.0.8:3260,1
>> iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3
>>
>> 10.35.0.*:3260,-1
>> iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4
>> 10.35.0.8:3260,1
>> iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4
>>
>>
>> Since in my setup same LUNs are MAPPED to both hosts
>>
>> i choose 2 LUNS on Openstack Node 1 and 2 on Openstack Node 2
>>
>>
>> *Node1 has *
>>
>> /dev/sdc12.0T  3.1G  2.0T   1% /var/lib/ceph/osd/ceph-0
>>
>> /dev/sdd12.0T  3.8G  2.0T   1% /var/lib/ceph/osd/ceph-1
>>
>> *Node 2 has *
>>
>> /dev/sdd12.0T  3.4G  2.0T   1% /var/lib/ceph/osd/ceph-2
>>
>> /dev/sde12.0T  3.5G  2.0T   1% /var/lib/ceph/osd/ceph-3
>>
>> [root@OSKVM1 ~]# ceph status
>>
>> cluster 9f923089-a6c0-4169-ace8-ad8cc4cca116
>>
>>  health HEALTH_WARN
>>
>> mon.OSKVM1 low disk space
>>
>>  monmap e1: 1 mons at {OSKVM1=10.24.0.4:6789/0}
>>
>> election epoch 1, quorum 0 OSKVM1
>>
>>  osdmap e40: 4 osds: 4 up, 4 in
>>
>>   pgmap v1154: 576 pgs, 5 pools, 6849 MB data, 860 objects
>>
>> 13857 MB used, 8154 GB / 8168 GB avail
>>
>>  576 active+clean
>>
>> *Can you please help me to know if it is correct configuration as per my
>> setup?*
>>
>> After this setup, i am trying to configure Cinder and Glance to use RBD
>> for a backend.
>> Glance image is already stored in RBD.
>> Following this link http://docs.ceph.com/docs/master/rbd/rbd-openstack/
>>
>> I have managed to install glance image in rbd. But i am finding some
>> issue in cinder configuration. Can you please help me on this?
>> As per link, i need to configure these parameters under [ceph] but i do
>> not have different section for [ceph]. infact i could find all these
>> parameters under [DEFAULT]. Is it ok to configure them under [DEFAULT].
>> CONFIGURING CINDER
>> 
>>
>> OpenStack requires a driver to interact with Ceph block devices. You must
>> also specify the pool name for the block device. On your OpenStack node,
>>

[ceph-users] radosgw live upgrade hammer -> jewel

2016-07-07 Thread Luis Periquito

Hi all,

I have (some) ceph clusters running hammer and they are serving S3 data.
There are a few radosgw serving requests, in a load balanced form
(actually OSPF anycast IPs).

Usually upgrades go smoothly whereby I upgrade a node at a time, and
traffic just gets redirected around the nodes that are running.

>From my tests upgrading to Jewel this is no longer the case, as I have
to stop all the radosgw then run a script to migrate the pools, only
then starting the radosgw processes.

I tested using both Infernalis and Hammer and it seems the behaviour
is the same.

Is there a way to run this upgrade live, without any downtime from the
radosgw service? What would be the best upgrade strategy?

thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] repomd.xml: [Errno 14] HTTP Error 404 - Not Found on download.ceph.com for rhel7

2016-07-07 Thread Martin Palma

Hi All,

it seems that the "rhel7" folder/symlink on
"download.ceph.com/rpm-hammer" does not exist anymore therefore
ceph-deploy fails to deploy a new cluster. Just tested it by setting
up a new lab environment.

We have the same issue on our production cluster currently, which
keeps us of updating it. Simple fix would be to change the url to
"download.ceph.com/rpm-hammer/el7/..."  in the repo files I guess

Any thoughts on that?

We are running on CentOS 7.2.

Best,
Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Fran Barrera

Ok, I understand, so I'll create a new mon to permit me stop the mon.a.

Thanks,
Fran.

2016-07-07 17:46 GMT+02:00 Joao Eduardo Luis :

> On 07/07/2016 04:39 PM, Fran Barrera wrote:
>
>> Yes, this is the problem.
>>
>
> Well, you lose quorum once you stop A.
>
> As the docs clearly state, you cannot tolerate failures if you have just
> two monitors.
>
> If your cluster only has two monitors, you cannot form quorum with just
> one monitor: you need a majority up, running and able to communicate among
> themselves.
>
> Simply put, you need at least (n+1)/2 monitors up for a quorum to be
> formed, 'n' being the total number of monitors in the cluster (i.e., in the
> monmap).
>
> You either need A and B to be running to be able to use the quorum, or you
> need to add another monitor (call it C) so that you can stop A and still
> have the cluster working.
>
>   -Joao
>
>
>> 2016-07-07 17:34 GMT+02:00 Joao Eduardo Luis > >:
>>
>> On 07/07/2016 04:31 PM, Fran Barrera wrote:
>>
>> Hello,
>>
>> Yes I've added two monitors but the error persist. In the error
>> I see
>> only the IP of the first mon, why not appears the second?
>>
>>
>> The description you offered on the initial email appears to state
>> the following:
>>
>> - You initially had one monitor (let's call it A)
>> - You added a second monitor (let's call it B)
>> - Everything works while A and B are running
>> - Nothing works if you stop A
>>
>> Did I understand your problem correctly?
>>
>>-Joao
>>
>>
>> I had only one monitors before and running good because I have
>> installed
>> AIO.
>>
>> Thanks.
>>
>> 2016-07-07 17:22 GMT+02:00 Joao Eduardo Luis > 
>> >>:
>>
>>
>>
>>  On 07/07/2016 04:17 PM, Fran Barrera wrote:
>>
>>  Hi all,
>>
>>  I have a cluster setup AIO with only one monitor and
>> now I've
>>  created
>>  another monitor in other server following this doc
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
>> but
>>  my
>>  problem is if I stop the AIO monitor, the cluster stop
>> working.
>>  It seems
>>  like the ceph is not updated with the new mon or
>> something
>>
>>
>>  In the doc you quoted, one can read:
>>
>>  "Due to the nature of Paxos, Ceph requires a majority of
>> monitors
>>  running to establish a quorum (thus establishing consensus).
>>
>>  [...]
>>
>>  For instance, on a 2 monitor deployment, no failures can be
>>  tolerated in order to maintain a quorum; with 3 monitors, one
>>  failure can be tolerated; [...]"
>>
>>  And in a box beneath, you also see
>>
>>  "Note:  A majority of monitors in your cluster must be able
>> to reach
>>  each other in order to establish a quorum."
>>
>>
>>  So, say you have 2 monitors and you need a majority of them
>> to be
>>  up, running, and able to communicate with each other in
>> order to
>>  form quorum. What's a majority of 2? How many failures can
>> you tolerate?
>>
>> -Joao
>>
>>
>>
>>
>>  ___
>>  ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> > >
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Joao Eduardo Luis

On 07/07/2016 04:39 PM, Fran Barrera wrote:

Yes, this is the problem.

Well, you lose quorum once you stop A.

As the docs clearly state, you cannot tolerate failures if you have just 
two monitors.

If your cluster only has two monitors, you cannot form quorum with just 
one monitor: you need a majority up, running and able to communicate 
among themselves.

Simply put, you need at least (n+1)/2 monitors up for a quorum to be 
formed, 'n' being the total number of monitors in the cluster (i.e., in 
the monmap).

You either need A and B to be running to be able to use the quorum, or 
you need to add another monitor (call it C) so that you can stop A and 
still have the cluster working.

  -Joao

2016-07-07 17:34 GMT+02:00 Joao Eduardo Luis >:

On 07/07/2016 04:31 PM, Fran Barrera wrote:

Hello,

Yes I've added two monitors but the error persist. In the error
I see
only the IP of the first mon, why not appears the second?

The description you offered on the initial email appears to state
the following:

- You initially had one monitor (let's call it A)
- You added a second monitor (let's call it B)
- Everything works while A and B are running
- Nothing works if you stop A

Did I understand your problem correctly?

   -Joao

I had only one monitors before and running good because I have
installed
AIO.

Thanks.

2016-07-07 17:22 GMT+02:00 Joao Eduardo Luis 
>>:

 On 07/07/2016 04:17 PM, Fran Barrera wrote:

 Hi all,

 I have a cluster setup AIO with only one monitor and
now I've
 created
 another monitor in other server following this doc
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but
 my
 problem is if I stop the AIO monitor, the cluster stop
working.
 It seems
 like the ceph is not updated with the new mon or something

 In the doc you quoted, one can read:

 "Due to the nature of Paxos, Ceph requires a majority of
monitors
 running to establish a quorum (thus establishing consensus).

 [...]

 For instance, on a 2 monitor deployment, no failures can be
 tolerated in order to maintain a quorum; with 3 monitors, one
 failure can be tolerated; [...]"

 And in a box beneath, you also see

 "Note:  A majority of monitors in your cluster must be able
to reach
 each other in order to establish a quorum."

 So, say you have 2 monitors and you need a majority of them
to be
 up, running, and able to communicate with each other in
order to
 form quorum. What's a majority of 2? How many failures can
you tolerate?

-Joao

 ___
 ceph-users mailing list
ceph-users@lists.ceph.com 
>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Fran Barrera

Yes, this is the problem.

2016-07-07 17:34 GMT+02:00 Joao Eduardo Luis :

> On 07/07/2016 04:31 PM, Fran Barrera wrote:
>
>> Hello,
>>
>> Yes I've added two monitors but the error persist. In the error I see
>> only the IP of the first mon, why not appears the second?
>>
>
> The description you offered on the initial email appears to state the
> following:
>
> - You initially had one monitor (let's call it A)
> - You added a second monitor (let's call it B)
> - Everything works while A and B are running
> - Nothing works if you stop A
>
> Did I understand your problem correctly?
>
>   -Joao
>
>
>> I had only one monitors before and running good because I have installed
>> AIO.
>>
>> Thanks.
>>
>> 2016-07-07 17:22 GMT+02:00 Joao Eduardo Luis > >:
>>
>>
>> On 07/07/2016 04:17 PM, Fran Barrera wrote:
>>
>> Hi all,
>>
>> I have a cluster setup AIO with only one monitor and now I've
>> created
>> another monitor in other server following this doc
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
>> but
>> my
>> problem is if I stop the AIO monitor, the cluster stop working.
>> It seems
>> like the ceph is not updated with the new mon or something
>>
>>
>> In the doc you quoted, one can read:
>>
>> "Due to the nature of Paxos, Ceph requires a majority of monitors
>> running to establish a quorum (thus establishing consensus).
>>
>> [...]
>>
>> For instance, on a 2 monitor deployment, no failures can be
>> tolerated in order to maintain a quorum; with 3 monitors, one
>> failure can be tolerated; [...]"
>>
>> And in a box beneath, you also see
>>
>> "Note:  A majority of monitors in your cluster must be able to reach
>> each other in order to establish a quorum."
>>
>>
>> So, say you have 2 monitors and you need a majority of them to be
>> up, running, and able to communicate with each other in order to
>> form quorum. What's a majority of 2? How many failures can you
>> tolerate?
>>
>>-Joao
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Fran Barrera

Hello,

Are you configured these two paremeters in cinder.conf?

rbd_user
rbd_secret_uuid

Regards.

2016-07-07 15:39 GMT+02:00 Gaurav Goyal :

> Hello Mr. Kees,
>
> Thanks for your response!
>
> My setup is
>
> Openstack Node 1 -> controller + network + compute1 (Liberty Version)
> Openstack node 2 --> Compute2
>
> Ceph version Hammer
>
> I am using dell storage with following status
>
> DELL SAN storage is attached to both hosts as
>
> [root@OSKVM1 ~]# iscsiadm -m node
>
> 10.35.0.3:3260,1
> iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1
>
> 10.35.0.8:3260,1
> iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1
>
> 10.35.0.*:3260,-1
> iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2
>
> 10.35.0.8:3260,1
> iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2
>
> 10.35.0.*:3260,-1
> iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3
>
> 10.35.0.8:3260,1
> iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3
>
> 10.35.0.*:3260,-1
> iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4
> 10.35.0.8:3260,1
> iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4
>
>
> Since in my setup same LUNs are MAPPED to both hosts
>
> i choose 2 LUNS on Openstack Node 1 and 2 on Openstack Node 2
>
>
> *Node1 has *
>
> /dev/sdc12.0T  3.1G  2.0T   1% /var/lib/ceph/osd/ceph-0
>
> /dev/sdd12.0T  3.8G  2.0T   1% /var/lib/ceph/osd/ceph-1
>
> *Node 2 has *
>
> /dev/sdd12.0T  3.4G  2.0T   1% /var/lib/ceph/osd/ceph-2
>
> /dev/sde12.0T  3.5G  2.0T   1% /var/lib/ceph/osd/ceph-3
>
> [root@OSKVM1 ~]# ceph status
>
> cluster 9f923089-a6c0-4169-ace8-ad8cc4cca116
>
>  health HEALTH_WARN
>
> mon.OSKVM1 low disk space
>
>  monmap e1: 1 mons at {OSKVM1=10.24.0.4:6789/0}
>
> election epoch 1, quorum 0 OSKVM1
>
>  osdmap e40: 4 osds: 4 up, 4 in
>
>   pgmap v1154: 576 pgs, 5 pools, 6849 MB data, 860 objects
>
> 13857 MB used, 8154 GB / 8168 GB avail
>
>  576 active+clean
>
> *Can you please help me to know if it is correct configuration as per my
> setup?*
>
> After this setup, i am trying to configure Cinder and Glance to use RBD
> for a backend.
> Glance image is already stored in RBD.
> Following this link http://docs.ceph.com/docs/master/rbd/rbd-openstack/
>
> I have managed to install glance image in rbd. But i am finding some issue
> in cinder configuration. Can you please help me on this?
> As per link, i need to configure these parameters under [ceph] but i do
> not have different section for [ceph]. infact i could find all these
> parameters under [DEFAULT]. Is it ok to configure them under [DEFAULT].
> CONFIGURING CINDER
> 
>
> OpenStack requires a driver to interact with Ceph block devices. You must
> also specify the pool name for the block device. On your OpenStack node,
> edit/etc/cinder/cinder.conf by adding:
>
> [DEFAULT]
> ...
> enabled_backends = ceph
> ...
> [ceph]
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> rbd_pool = volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> glance_api_version = 2
>
> I find following error in cinder service status
>
> systemctl status openstack-cinder-volume.service
>
> Jul 07 09:37:01 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:01.058
> 136259 ERROR cinder.service [-] Manager for service cinder-volume
> OSKVM1@ceph is reporting problems, not sending heartbeat. Service will
> appear "down".
>
> Jul 07 09:37:02 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:02.040
> 136259 WARNING cinder.volume.manager
> [req-561ddd3c-9560-4374-a958-7a2c103af7ee - - - - -] Update driver status
> failed: (config name ceph) is uninitialized.
>
> Jul 07 09:37:11 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:11.059
> 136259 ERROR cinder.service [-] Manager for service cinder-volume
> OSKVM1@ceph is reporting problems, not sending heartbeat. Service will
> appear "down".
>
>
>
> [root@OSKVM2 ~]# rbd -p images ls
>
> a8b45c8a-a5c8-49d8-a529-1e4088bdbf3f
>
> [root@OSKVM2 ~]# rados df
>
> pool name KB  objects   clones degraded
> unfound   rdrd KB   wrwr KB
>
> backups0000
> 00000
>
> images   7013377  86000
> 0 9486 7758 2580  7013377
>
> rbd0000
> 00000
>
> vms0000
> 0000

Re: [ceph-users] Monitor question

2016-07-07 Thread Joao Eduardo Luis


On 07/07/2016 04:31 PM, Fran Barrera wrote:

Hello,

Yes I've added two monitors but the error persist. In the error I see
only the IP of the first mon, why not appears the second?


The description you offered on the initial email appears to state the 
following:


- You initially had one monitor (let's call it A)
- You added a second monitor (let's call it B)
- Everything works while A and B are running
- Nothing works if you stop A

Did I understand your problem correctly?

  -Joao



I had only one monitors before and running good because I have installed
AIO.

Thanks.

2016-07-07 17:22 GMT+02:00 Joao Eduardo Luis >:

On 07/07/2016 04:17 PM, Fran Barrera wrote:

Hi all,

I have a cluster setup AIO with only one monitor and now I've
created
another monitor in other server following this doc
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but
my
problem is if I stop the AIO monitor, the cluster stop working.
It seems
like the ceph is not updated with the new mon or something


In the doc you quoted, one can read:

"Due to the nature of Paxos, Ceph requires a majority of monitors
running to establish a quorum (thus establishing consensus).

[...]

For instance, on a 2 monitor deployment, no failures can be
tolerated in order to maintain a quorum; with 3 monitors, one
failure can be tolerated; [...]"

And in a box beneath, you also see

"Note:  A majority of monitors in your cluster must be able to reach
each other in order to establish a quorum."


So, say you have 2 monitors and you need a majority of them to be
up, running, and able to communicate with each other in order to
form quorum. What's a majority of 2? How many failures can you tolerate?

   -Joao




___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Fran Barrera

Hello,

Yes I've added two monitors but the error persist. In the error I see only
the IP of the first mon, why not appears the second?

I had only one monitors before and running good because I have installed
AIO.

Thanks.

2016-07-07 17:22 GMT+02:00 Joao Eduardo Luis :

> On 07/07/2016 04:17 PM, Fran Barrera wrote:
>
>> Hi all,
>>
>> I have a cluster setup AIO with only one monitor and now I've created
>> another monitor in other server following this doc
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but my
>> problem is if I stop the AIO monitor, the cluster stop working. It seems
>> like the ceph is not updated with the new mon or something
>>
>
> In the doc you quoted, one can read:
>
> "Due to the nature of Paxos, Ceph requires a majority of monitors running
> to establish a quorum (thus establishing consensus).
>
> [...]
>
> For instance, on a 2 monitor deployment, no failures can be tolerated in
> order to maintain a quorum; with 3 monitors, one failure can be tolerated;
> [...]"
>
> And in a box beneath, you also see
>
> "Note:  A majority of monitors in your cluster must be able to reach each
> other in order to establish a quorum."
>
>
> So, say you have 2 monitors and you need a majority of them to be up,
> running, and able to communicate with each other in order to form quorum.
> What's a majority of 2? How many failures can you tolerate?
>
>   -Joao
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Joao Eduardo Luis


On 07/07/2016 04:17 PM, Fran Barrera wrote:

Hi all,

I have a cluster setup AIO with only one monitor and now I've created
another monitor in other server following this doc
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but my
problem is if I stop the AIO monitor, the cluster stop working. It seems
like the ceph is not updated with the new mon or something


In the doc you quoted, one can read:

"Due to the nature of Paxos, Ceph requires a majority of monitors 
running to establish a quorum (thus establishing consensus).


[...]

For instance, on a 2 monitor deployment, no failures can be tolerated in 
order to maintain a quorum; with 3 monitors, one failure can be 
tolerated; [...]"


And in a box beneath, you also see

"Note:  A majority of monitors in your cluster must be able to reach 
each other in order to establish a quorum."



So, say you have 2 monitors and you need a majority of them to be up, 
running, and able to communicate with each other in order to form 
quorum. What's a majority of 2? How many failures can you tolerate?


  -Joao



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor question

2016-07-07 Thread Matyas Koszik


Hi,

That error message is normal, it just says your monitor is down (which it
is). If you have added the second monitor in your ceph.conf, then it'll
try contacting that, and if it's up and reachable, this will succeed, so
after that scary error message you should see the normal reply as well.

The important thing is to keep a consistent ceph.conf accross the cluster.

Matyas

On Thu, 7 Jul 2016, Fran Barrera wrote:

> Hi all,
>
> I have a cluster setup AIO with only one monitor and now I've created
> another monitor in other server following this doc
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but my
> problem is if I stop the AIO monitor, the cluster stop working. It seems
> like the ceph is not updated with the new mon or something
>
> Here I can see two monitors:
> $ ceph -s
> cluster 0817ef6e-233d-41cc-801c-cfb90ed9597a
>  health HEALTH_OK
>  monmap e2: 2 mons at {ceph-monitor-2=
> 192.168.1.10:6789/0,ceph-monitor=192.168.1.11:6789/0}
> election epoch 40, quorum 0,1 ceph-monitor-2,ceph-monitor
>  osdmap e400: 4 osds: 4 up, 4 in
> flags sortbitwise
>   pgmap v832123: 684 pgs, 7 pools, 291 GB data, 38993 objects
> 303 GB used, 3420 GB / 3724 GB avail
>  684 active+clean
>
> But If I stop the ceph-monitor I can see this error:
>
> 2016-07-07 17:11:52.287879 7fc04c1fa700  0 -- 192.168.1.10:0/3737104056 >>
> pipe(0x7fc03c000cc0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fc03c002000).fault
>
> Any help?
>
> Thanks,
> Fran.
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Monitor question

2016-07-07 Thread Fran Barrera

Hi all,

I have a cluster setup AIO with only one monitor and now I've created
another monitor in other server following this doc
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ but my
problem is if I stop the AIO monitor, the cluster stop working. It seems
like the ceph is not updated with the new mon or something

Here I can see two monitors:
$ ceph -s
cluster 0817ef6e-233d-41cc-801c-cfb90ed9597a
 health HEALTH_OK
 monmap e2: 2 mons at {ceph-monitor-2=
192.168.1.10:6789/0,ceph-monitor=192.168.1.11:6789/0}
election epoch 40, quorum 0,1 ceph-monitor-2,ceph-monitor
 osdmap e400: 4 osds: 4 up, 4 in
flags sortbitwise
  pgmap v832123: 684 pgs, 7 pools, 291 GB data, 38993 objects
303 GB used, 3420 GB / 3724 GB avail
 684 active+clean

But If I stop the ceph-monitor I can see this error:

2016-07-07 17:11:52.287879 7fc04c1fa700  0 -- 192.168.1.10:0/3737104056 >>
pipe(0x7fc03c000cc0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fc03c002000).fault

Any help?

Thanks,
Fran.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] is it time already to move from hammer to jewel?

2016-07-07 Thread Alexandre DERUMIER

For a new cluster, it still missing this udev rules

https://github.com/ceph/ceph/commit/35004a628b2969d8b2f1c02155bb235165a1d809

but it's not a problem on existing cluster, as the old udev rules till exist I 
think.

Anyway, you can copy it manually.


I'm running jewel without any problem since 3 week now, on 2 different cluster.




- Mail original -
De: "Shain Miley" 
À: "Zoltan Arnold Nagy" , "ceph-users" 

Envoyé: Jeudi 7 Juillet 2016 16:32:51
Objet: Re: [ceph-users] is it time already to move from hammer to jewel?

+1 on looking for some thoughts on this. 

We are in the same boat and looking for some guidance as well. 

Thanks, 

Shain 

On 07/06/2016 01:47 PM, Zoltan Arnold Nagy wrote: 
> Hey, 
> 
> Those out there who are running production clusters: have you upgraded 
> already to Jewel? 
> I usually wait until .2 is out (which it is now for Jewel) but just looking 
> for largish deployment experiences in the field before I pull the trigger 
> over the weekend. It’s a largish upgrade going from ~460TB to 1.2PB and at 
> this point I’m just overly cautious. 
> 
> One of the main reasons I want to upgrade to is to get the new IO queuing 
> code so the recovery ops won’t hit the VM workloads (that much). (Yes, we 
> have it configured properly, but even one backfill is visible in our client 
> IO…) 
> Other than that I’m happy with hammer. 
> 
> Cheers, 
> Zoltan 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-- 
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org | 
202.513.3649 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How to check consistency of File / Block Data

2016-07-07 Thread Venkata Manojawa Paritala

Hi,

Is there any way we can check/verify data consistency for block and file
data in Ceph. I need to develop a script to the same.

WRT object data, I am checking the consistency with the below method.

1. Create a file and calculate md5 checksum for it.
2. Push the file to a ceph pool.
3. Get the location of the object using command "ceph osd map 
". This will give the pg ids in which the object is placed.
4. Go to the respective osd -> pg locations and calculate the md5 checksums
of the objects.
5. Compare the checksums generated in steps #1 & #4.

Can you also let me know if there is any other better method to verify the
consistency on object data? Instead of checksums,

Thanks & Regards,
Manoj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] is it time already to move from hammer to jewel?

2016-07-07 Thread Shain Miley


+1 on looking for some thoughts on this.

We are in the same boat and looking for some guidance as well.

Thanks,

Shain

On 07/06/2016 01:47 PM, Zoltan Arnold Nagy wrote:

Hey,

Those out there who are running production clusters: have you upgraded already 
to Jewel?
I usually wait until .2 is out (which it is now for Jewel) but just looking for 
largish deployment experiences in the field before I pull the trigger over the 
weekend. It’s a largish upgrade going from ~460TB to 1.2PB and at this point 
I’m just overly cautious.

One of the main reasons I want to upgrade to is to get the new IO queuing code 
so the recovery ops won’t hit the VM workloads (that much). (Yes, we have it 
configured properly, but even one backfill is visible in our client IO…)
Other than that I’m happy with hammer.

Cheers,
Zoltan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org | 
202.513.3649

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Patrick Donnelly

On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
 wrote:
> Unfortunately, the other user application breaks ceph-fuse again (It is a
> completely different application then in my previous test).
>
> We have tested it in 4 machines with 4 cores. The user is submitting 16
> single core jobs which are all writing different output files (one per job)
> to a common dir in cephfs. The first 4 jobs run happily and never break
> ceph-fuse. But the remaining 12 jobs, running in the remaining 3 machines,
> trigger a segmentation fault, which is completely different from the other
> case.
>
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> 1: (()+0x297fe2) [0x7f54402b7fe2]
> 2: (()+0xf7e0) [0x7f543ecf77e0]
> 3: (ObjectCacher::bh_write_scattered(std::list std::allocator >&)+0x36) [0x7f5440268086]
> 4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
> std::chrono::time_point std::chrono::duration > >, long*,
> int*)+0x22c) [0x7f5440268a3c]
> 5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
> 6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
> 7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
> 8: (()+0x7aa1) [0x7f543ecefaa1]
>  9: (clone()+0x6d) [0x7f543df6893d]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.

This one looks like a very different problem. I've created an issue
here: http://tracker.ceph.com/issues/16610

Thanks for the report and debug log!

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Gaurav Goyal

Hello Mr. Kees,

Thanks for your response!

My setup is

Openstack Node 1 -> controller + network + compute1 (Liberty Version)
Openstack node 2 --> Compute2

Ceph version Hammer

I am using dell storage with following status

DELL SAN storage is attached to both hosts as

[root@OSKVM1 ~]# iscsiadm -m node

10.35.0.3:3260,1
iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1

10.35.0.8:3260,1
iqn.2001-05.com.equallogic:0-1cb196-07a83c107-4770018575af-vol1

10.35.0.*:3260,-1
iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2

10.35.0.8:3260,1
iqn.2001-05.com.equallogic:0-1cb196-20d83c107-729002157606-vol2

10.35.0.*:3260,-1
iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3

10.35.0.8:3260,1
iqn.2001-05.com.equallogic:0-1cb196-f0783c107-70a00245761a-vol3

10.35.0.*:3260,-1
iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4
10.35.0.8:3260,1
iqn.2001-05.com.equallogic:0-1cb196-fda83c107-92700275761a-vol4


Since in my setup same LUNs are MAPPED to both hosts

i choose 2 LUNS on Openstack Node 1 and 2 on Openstack Node 2


*Node1 has *

/dev/sdc12.0T  3.1G  2.0T   1% /var/lib/ceph/osd/ceph-0

/dev/sdd12.0T  3.8G  2.0T   1% /var/lib/ceph/osd/ceph-1

*Node 2 has *

/dev/sdd12.0T  3.4G  2.0T   1% /var/lib/ceph/osd/ceph-2

/dev/sde12.0T  3.5G  2.0T   1% /var/lib/ceph/osd/ceph-3

[root@OSKVM1 ~]# ceph status

cluster 9f923089-a6c0-4169-ace8-ad8cc4cca116

 health HEALTH_WARN

mon.OSKVM1 low disk space

 monmap e1: 1 mons at {OSKVM1=10.24.0.4:6789/0}

election epoch 1, quorum 0 OSKVM1

 osdmap e40: 4 osds: 4 up, 4 in

  pgmap v1154: 576 pgs, 5 pools, 6849 MB data, 860 objects

13857 MB used, 8154 GB / 8168 GB avail

 576 active+clean

*Can you please help me to know if it is correct configuration as per my
setup?*

After this setup, i am trying to configure Cinder and Glance to use RBD for
a backend.
Glance image is already stored in RBD.
Following this link http://docs.ceph.com/docs/master/rbd/rbd-openstack/

I have managed to install glance image in rbd. But i am finding some issue
in cinder configuration. Can you please help me on this?
As per link, i need to configure these parameters under [ceph] but i do not
have different section for [ceph]. infact i could find all these parameters
under [DEFAULT]. Is it ok to configure them under [DEFAULT].
CONFIGURING CINDER


OpenStack requires a driver to interact with Ceph block devices. You must
also specify the pool name for the block device. On your OpenStack node,
edit/etc/cinder/cinder.conf by adding:

[DEFAULT]
...
enabled_backends = ceph
...
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2

I find following error in cinder service status

systemctl status openstack-cinder-volume.service

Jul 07 09:37:01 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:01.058
136259 ERROR cinder.service [-] Manager for service cinder-volume
OSKVM1@ceph is reporting problems, not sending heartbeat. Service will
appear "down".

Jul 07 09:37:02 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:02.040
136259 WARNING cinder.volume.manager
[req-561ddd3c-9560-4374-a958-7a2c103af7ee - - - - -] Update driver status
failed: (config name ceph) is uninitialized.

Jul 07 09:37:11 OSKVM1 cinder-volume[136247]: 2016-07-07 09:37:11.059
136259 ERROR cinder.service [-] Manager for service cinder-volume
OSKVM1@ceph is reporting problems, not sending heartbeat. Service will
appear "down".



[root@OSKVM2 ~]# rbd -p images ls

a8b45c8a-a5c8-49d8-a529-1e4088bdbf3f

[root@OSKVM2 ~]# rados df

pool name KB  objects   clones degraded
unfound   rdrd KB   wrwr KB

backups0000
  00000

images   7013377  86000
  0 9486 7758 2580  7013377

rbd0000
  00000

vms0000
  00000

volumes0000
  00000

  total used14190236  860

  total avail 8550637828

  total space 8564828064




[root@OSKVM2 ~]# ceph auth list

installed auth entries:


mds.OSKVM1

key: AQCK6XtXNBFdDBAAXmX73gBqK3lyakSxxP+XjA==

caps: [mds] allow

caps: [mon] allow profile mds

caps: [osd] allow rwx

Re: [ceph-users] layer3 network

2016-07-07 Thread Nick Fisk



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Matyas Koszik
> Sent: 07 July 2016 14:01
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] layer3 network
> 
> 
> Setting 'osd addr' in the osd configuration section unfortunately also does 
> not influence source address selection, the outgoing
> interface IP is used like before.

How about using the "ip route add" command with the src parameter to set the 
source address?

No idea if there is something clever you can do with OSPF to set this, or if 
you need to manually set this on your default route.

> 
> 
> 
> On Thu, 7 Jul 2016, Luis Periquito wrote:
> 
> > If, like me, you have several different networks, or they overlap for
> > whatever reason, I just have the options:
> >
> > mon addr = IP:port
> > osd addr = IP
> >
> > in the relevant sections. However I use puppet to deploy ceph, and all
> > files are "manually" created.
> >
> > So it becomes something like this:
> >
> > [mon.mon1]
> >   host = mon1
> >   mon addr = x.y.z.a:6789
> >   mon data = /var/lib/ceph/mon/internal-mon1 [osd.0]
> >   host = dskh1
> >   osd addr = x.y.z.a
> >   osd data = /var/lib/ceph/osd/osd-0
> >   osd journal = /var/lib/ceph/osd/journal/osd-0
> >   keyring = /var/lib/ceph/osd/osd-0/keyring
> >   osd max backfills = 1
> >   osd recovery max active = 1
> >   osd recovery op priority = 1
> >   osd client op priority = 63
> >   osd disk thread ioprio class = idle
> >   osd disk thread ioprio priority = 7
> > [osd.1]
> > 
> >
> > necessarily the host and addr parts are correct in our environment.
> >
> > On 7 July 2016 at 11:36, Nick Fisk  wrote:
> >
> > > > -Original Message-
> > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > > > Behalf
> > > Of Matyas Koszik
> > > > Sent: 07 July 2016 11:26
> > > > To: ceph-users@lists.ceph.com
> > > > Subject: [ceph-users] layer3 network
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > My setup uses a layer3 network, where each node has two
> > > > connections
> > > (/31s), equipped with a loopback address and redundancy is
> > > > provided via OSPF. In this setup it is important to use the
> > > > loopback
> > > address as source for outgoing connections, since the interface
> > > > addresses are not protected from failure, but the loopback address is.
> > > >
> > > > So I set the public addr and the cluster addr to the desired ip,
> > > > but it
> > > seems that the outgoing connections do not use this as the source
> > > > address.
> > > > I'm using jewel; is this the expected behavior?
> > >
> > > Do your public/cluster networks overlap the physical connection
> > > IP's? From what I understand Ceph binds to the interface whose IP
> > > lies within the range specified in the conf file.
> > >
> > > So for example if public addr = 192.168.1.0/24
> > >
> > > Then your loopback should be in that range, but you must make sure
> > > the physical nics lie outside this range.
> > >
> > > I'm following this with interest as I am about to deploy something
> > > very similar.
> > >
> > > >
> > > > Matyas
> > > >
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> >
> > --
> >
> > Luis Periquito
> >
> > Unix Team Lead
> >
> > 
> >
> > Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
> > Hatfield, Herts AL10 9NE
> >
> > --
> >
> >
> > Notice:  This email is confidential and may contain copyright material
> > of members of the Ocado Group. Opinions and views expressed in this
> > message may not necessarily reflect the opinions and views of the
> > members of the Ocado Group.
> >
> >
> >
> > If you are not the intended recipient, please notify us immediately
> > and delete all copies of this message. Please note that it is your
> > responsibility to scan this message for viruses.
> >
> >
> >
> > Fetch and Sizzle are trading names of Speciality Stores Limited, a
> > member of the Ocado Group.
> >
> >
> >
> > References to the âOcado Groupâ are to Ocado Group plc (registered
> > in England and Wales with number 7098618) and its subsidiary
> > undertakings (as that expression is defined in the Companies Act 2006) from 
> > time to time.
> > The registered office of Ocado Group plc is Titan Court, 3 Bishops
> > Square, Hatfield Business Park, Hatfield, Herts. AL10 9NE.
> >
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] layer3 network

2016-07-07 Thread Matyas Koszik


Setting 'osd addr' in the osd configuration section unfortunately also
does not influence source address selection, the outgoing interface
IP is used like before.



On Thu, 7 Jul 2016, Luis Periquito wrote:

> If, like me, you have several different networks, or they overlap for
> whatever reason, I just have the options:
>
> mon addr = IP:port
> osd addr = IP
>
> in the relevant sections. However I use puppet to deploy ceph, and all
> files are "manually" created.
>
> So it becomes something like this:
>
> [mon.mon1]
>   host = mon1
>   mon addr = x.y.z.a:6789
>   mon data = /var/lib/ceph/mon/internal-mon1
> [osd.0]
>   host = dskh1
>   osd addr = x.y.z.a
>   osd data = /var/lib/ceph/osd/osd-0
>   osd journal = /var/lib/ceph/osd/journal/osd-0
>   keyring = /var/lib/ceph/osd/osd-0/keyring
>   osd max backfills = 1
>   osd recovery max active = 1
>   osd recovery op priority = 1
>   osd client op priority = 63
>   osd disk thread ioprio class = idle
>   osd disk thread ioprio priority = 7
> [osd.1]
> 
>
> necessarily the host and addr parts are correct in our environment.
>
> On 7 July 2016 at 11:36, Nick Fisk  wrote:
>
> > > -Original Message-
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Matyas Koszik
> > > Sent: 07 July 2016 11:26
> > > To: ceph-users@lists.ceph.com
> > > Subject: [ceph-users] layer3 network
> > >
> > >
> > >
> > > Hi,
> > >
> > > My setup uses a layer3 network, where each node has two connections
> > (/31s), equipped with a loopback address and redundancy is
> > > provided via OSPF. In this setup it is important to use the loopback
> > address as source for outgoing connections, since the
> > interface
> > > addresses are not protected from failure, but the loopback address is.
> > >
> > > So I set the public addr and the cluster addr to the desired ip, but it
> > seems that the outgoing connections do not use this as the
> > source
> > > address.
> > > I'm using jewel; is this the expected behavior?
> >
> > Do your public/cluster networks overlap the physical connection IP's? From
> > what I understand Ceph binds to the interface whose IP
> > lies within the range specified in the conf file.
> >
> > So for example if public addr = 192.168.1.0/24
> >
> > Then your loopback should be in that range, but you must make sure the
> > physical nics lie outside this range.
> >
> > I'm following this with interest as I am about to deploy something very
> > similar.
> >
> > >
> > > Matyas
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
>
> Luis Periquito
>
> Unix Team Lead
>
> 
>
> Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
> Hatfield, Herts AL10 9NE
>
> --
>
>
> Notice:  This email is confidential and may contain copyright material of
> members of the Ocado Group. Opinions and views expressed in this message
> may not necessarily reflect the opinions and views of the members of the
> Ocado Group.
>
>
>
> If you are not the intended recipient, please notify us immediately and
> delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
>
>
>
> Fetch and Sizzle are trading names of Speciality Stores Limited, a member
> of the Ocado Group.
>
>
>
> References to the âOcado Groupâ are to Ocado Group plc (registered in
> England and Wales with number 7098618) and its subsidiary undertakings (as
> that expression is defined in the Companies Act 2006) from time to time.
> The registered office of Ocado Group plc is Titan Court, 3 Bishops Square,
> Hatfield Business Park, Hatfield, Herts. AL10 9NE.
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD - Deletion / Discard - IO Impact

2016-07-07 Thread Nick Fisk

> -Original Message-
> From: Anand Bhat [mailto:anand.b...@gmail.com]
> Sent: 07 July 2016 13:46
> To: n...@fisk.me.uk
> Cc: ceph-users 
> Subject: Re: [ceph-users] RBD - Deletion / Discard - IO Impact
> 
> These are known problem.
> 
> Are you doing mkfs.xfs on SSD? If so, please check SSD data sheets whether 
> UNMAP is supported. To avoid unmap during mkfs, use
> mkfs.xfs -K


Thanks for your reply

The RBD's are on normal spinners (+SSD Journals)

> 
> Regards,
> Anand
> 
> On Thu, Jul 7, 2016 at 5:23 PM, Nick Fisk  wrote:
> Hi All,
> 
> Does anybody else see a massive (ie 10x) performance impact when either 
> deleting a RBD or running something like mkfs.xfs against
> an existing RBD, which would zero/discard all blocks?
> 
> In the case of deleting a 4TB RBD, I’m seeing latency in some cases rise up 
> to 10s.
> 
> It looks like it the XFS deletions on the OSD which are potentially 
> responsible for the massive drop in performance as I see random
> OSD’s in turn peak to 100% utilisation.
> 
> I’m not aware of any throttling than can be done to reduce this impact, but 
> would be interested to here from anyone else that may
> experience this.
> 
> Nick
> 
> 
> 
> ___
> ceph-users mailing list
> mailto:ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> --
> 
> Never say never.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread Nick Fisk

Hi Christian,

> -Original Message-
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: 07 July 2016 12:57
> To: ceph-users@lists.ceph.com
> Cc: Nick Fisk 
> Subject: Re: [ceph-users] multiple journals on SSD
> 
> 
> Hello Nick,
> 
> On Thu, 7 Jul 2016 09:45:58 +0100 Nick Fisk wrote:
> 
> > Just to add if you really want to go with lots of HDD's to Journals
> > then go NVME. They are not a lot more expensive than the equivalent
> > SATA based 3700's, but the latency is low low low. Here is an example
> > of a node I have just commissioned with 12 HDD's to one P3700
> >
> > Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> > avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> > sdb   0.00 0.00   68.000.00  8210.00 0.00
> > 241.47 0.263.853.850.00   2.09  14.20
> > sdd   2.50 0.00  198.50   22.00 24938.00  9422.00
> > 311.66 4.34   27.806.21  222.64   2.45  54.00
> > sdc   0.00 0.00   63.000.00  7760.00 0.00
> > 246.35 0.152.162.160.00   1.56   9.80
> > sda   0.00 0.00   61.50   47.00  7600.00 22424.00
> > 553.44 2.77   25.572.63   55.57   3.82  41.40
> > nvme0n1   0.0022.502.00 2605.00 8.00 139638.00
> > 107.13 0.140.050.000.05   0.03   6.60
> > sdg   0.00 0.00   61.00   28.00  6230.00 12696.00
> > 425.30 3.66   74.795.84  225.00   3.87  34.40
> > sdf   0.00 0.00   34.50   47.00  4108.00 21702.00
> > 633.37 3.56   43.751.51   74.77   2.85  23.20
> > sdh   0.00 0.00   75.00   15.50  9180.00  4984.00
> > 313.02 0.45   12.553.28   57.42   3.51  31.80
> > sdi   1.50 0.50  142.00   48.50 18102.00 21924.00
> > 420.22 3.60   18.924.99   59.71   2.70  51.40
> > sdj   0.50 0.00   74.505.00  9362.00  1832.00
> > 281.61 0.334.103.33   15.60   2.44  19.40
> > sdk   0.00 0.00   54.000.00  6420.00 0.00
> > 237.78 0.122.302.300.00   1.70   9.20
> > sdl   0.00 0.00   21.001.50  2286.0016.00
> > 204.62 0.32   18.13   13.81   78.67   6.67  15.00
> > sde   0.00 0.00   98.000.00 12304.00 0.00
> > 251.10 0.303.103.100.00   2.08  20.40
> >
> Is that a live sample from iostat or the initial/one-shot summary?

1st of all, apologies for the formatting, that looks really ugly above, fixed 
now. Iostat had been running for a while, I just copied one of the sections, so 
live.

> 
> > 50us latency at 2605 iops!!!
> >
> At less than 5% IOPS or 14% bandwidth capacity running more than twice as 
> slow than the spec sheet says. ^o^ Fast, very much so.
> But not mindnumbingly so.
> 
> The real question here is, how much of that latency improvement do you see in 
> the Ceph clients, VMs?
> 
> I'd venture not so much, given that most latency happens in Ceph.

Admittedly not much, but it's very hard to tell as its only 1/5th of the 
cluster. Looking at graphs in graphite, I can see the filestore journal latency 
is massively lower. The subop latency is somewhere between a 1/2 to 3/4 of the 
older nodes. At higher queue depths the NVME device is always showing at least 
1ms lower latency, so it must be having a positive effect.

My new cluster which should be going live in a couple of weeks, will be 
comprised of just these node types so I will have a better idea then. Also they 
will have 4x3.9Ghz CPU's which go a long way to reducing latency as well. I'm 
aiming for ~1ms at the client for a 4kb write.

> 
> That all said, I'd go for a similar setup as well, if I had a dozen storage 
> nodes or more.
> But at my current cluster sizes that's too many eggs in one basket for me.

Yeah, I'm only at 5 nodes, but decided that having a cold spare on hand 
justified the risk for the intended use (backups)

> My "largest" cluster is now up to node 5, going from 4 journal SSDs for 8 
> HDDs to 2 journal SSDs for 12 HDDs. Woo-Woo!
> 
> > Compared to one of the other nodes with 2 100GB S3700's, 6 disks each
> >
> Well, that's not really fair, is it?
> 
> Those SSDs have a 5 times lower bandwidth, triple the write latency and the 
> SATA bus instead of the PCIe zipway when compared to
> the smallest P 3700.
> 
> And 6 disk are a bit much for that SSD, 4 would be pushing it.
> Whereas 12 HDDs for the P model are a good match, overkill really.

No, good point, but it demonstrates the change in my mindset from 2 years ago 
that I think most newbies to Ceph also go through. Back then I was like "SSD, 
wow fast, they will never be a problem", then I started understand the effects 
of latency serialisation. The S3700's have never been above 50% utilisation as 
my workload is lots of very small IO's, but I quite regularly see their latency 
above 1ms. I guess my point was its not just a case of trying to make sure MB/s 
= # Disks, there are other important factors.

Re: [ceph-users] RBD - Deletion / Discard - IO Impact

2016-07-07 Thread Anand Bhat

These are known problem.

Are you doing mkfs.xfs on SSD? If so, please check SSD data sheets whether
UNMAP is supported. To avoid unmap during mkfs, use mkfs.xfs -K

Regards,
Anand

On Thu, Jul 7, 2016 at 5:23 PM, Nick Fisk  wrote:

> Hi All,
>
>
>
> Does anybody else see a massive (ie 10x) performance impact when either
> deleting a RBD or running something like mkfs.xfs against an existing RBD,
> which would zero/discard all blocks?
>
>
>
> In the case of deleting a 4TB RBD, I’m seeing latency in some cases rise
> up to 10s.
>
>
>
> It looks like it the XFS deletions on the OSD which are potentially
> responsible for the massive drop in performance as I see random OSD’s in
> turn peak to 100% utilisation.
>
>
>
> I’m not aware of any throttling than can be done to reduce this impact,
> but would be interested to here from anyone else that may experience this.
>
>
>
> Nick
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Never say never.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] layer3 network

2016-07-07 Thread Luis Periquito

If, like me, you have several different networks, or they overlap for
whatever reason, I just have the options:

mon addr = IP:port
osd addr = IP

in the relevant sections. However I use puppet to deploy ceph, and all
files are "manually" created.

So it becomes something like this:

[mon.mon1]
  host = mon1
  mon addr = x.y.z.a:6789
  mon data = /var/lib/ceph/mon/internal-mon1
[osd.0]
  host = dskh1
  osd addr = x.y.z.a
  osd data = /var/lib/ceph/osd/osd-0
  osd journal = /var/lib/ceph/osd/journal/osd-0
  keyring = /var/lib/ceph/osd/osd-0/keyring
  osd max backfills = 1
  osd recovery max active = 1
  osd recovery op priority = 1
  osd client op priority = 63
  osd disk thread ioprio class = idle
  osd disk thread ioprio priority = 7
[osd.1]

necessarily the host and addr parts are correct in our environment.

On 7 July 2016 at 11:36, Nick Fisk  wrote:

> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Matyas Koszik
> > Sent: 07 July 2016 11:26
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] layer3 network
> >
> >
> >
> > Hi,
> >
> > My setup uses a layer3 network, where each node has two connections
> (/31s), equipped with a loopback address and redundancy is
> > provided via OSPF. In this setup it is important to use the loopback
> address as source for outgoing connections, since the
> interface
> > addresses are not protected from failure, but the loopback address is.
> >
> > So I set the public addr and the cluster addr to the desired ip, but it
> seems that the outgoing connections do not use this as the
> source
> > address.
> > I'm using jewel; is this the expected behavior?
>
> Do your public/cluster networks overlap the physical connection IP's? From
> what I understand Ceph binds to the interface whose IP
> lies within the range specified in the conf file.
>
> So for example if public addr = 192.168.1.0/24
>
> Then your loopback should be in that range, but you must make sure the
> physical nics lie outside this range.
>
> I'm following this with interest as I am about to deploy something very
> similar.
>
> >
> > Matyas
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 

Luis Periquito

Unix Team Lead

Head Office, Titan Court, 3 Bishop Square, Hatfield Business Park,
Hatfield, Herts AL10 9NE

-- 

Notice:  This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group. 

If you are not the intended recipient, please notify us immediately and 
delete all copies of this message. Please note that it is your 
responsibility to scan this message for viruses. 

Fetch and Sizzle are trading names of Speciality Stores Limited, a member 
of the Ocado Group.

References to the “Ocado Group” are to Ocado Group plc (registered in 
England and Wales with number 7098618) and its subsidiary undertakings (as 
that expression is defined in the Companies Act 2006) from time to time.  
The registered office of Ocado Group plc is Titan Court, 3 Bishops Square, 
Hatfield Business Park, Hatfield, Herts. AL10 9NE.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD - Deletion / Discard - IO Impact

2016-07-07 Thread Nick Fisk

Hi All,

 

Does anybody else see a massive (ie 10x) performance impact when either 
deleting a RBD or running something like mkfs.xfs against an
existing RBD, which would zero/discard all blocks?

 

In the case of deleting a 4TB RBD, I'm seeing latency in some cases rise up to 
10s.

 

It looks like it the XFS deletions on the OSD which are potentially responsible 
for the massive drop in performance as I see random
OSD's in turn peak to 100% utilisation.

 

I'm not aware of any throttling than can be done to reduce this impact, but 
would be interested to here from anyone else that may
experience this.

 

Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] layer3 network

2016-07-07 Thread George Shuklin

I found no options about source IP for ceph. Probably you may try to use 
network namespaces to isolate ceph services with desired interfaces. 
This would require a bit more setup though. You would need to create 
namespace, add some kind of patch (veth?) interface between namespace 
and host, but after that all traffic will definitively pass the lo 
interface at least once.


On 07/07/2016 01:25 PM, Matyas Koszik wrote:


Hi,

My setup uses a layer3 network, where each node has two connections
(/31s), equipped with a loopback address and redundancy is provided via
OSPF. In this setup it is important to use the loopback address as source
for outgoing connections, since the interface addresses are not protected
from failure, but the loopback address is.

So I set the public addr and the cluster addr to the desired ip, but it
seems that the outgoing connections do not use this as the source address.
I'm using jewel; is this the expected behavior?

Matyas


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread Christian Balzer


Hello Nick,

On Thu, 7 Jul 2016 09:45:58 +0100 Nick Fisk wrote:

> Just to add if you really want to go with lots of HDD's to Journals then
> go NVME. They are not a lot more expensive than the equivalent SATA based
> 3700's, but the latency is low low low. Here is an example of a node I
> have just commissioned with 12 HDD's to one P3700
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdb   0.00 0.00   68.000.00  8210.00 0.00
> 241.47 0.263.853.850.00   2.09  14.20
> sdd   2.50 0.00  198.50   22.00 24938.00  9422.00
> 311.66 4.34   27.806.21  222.64   2.45  54.00
> sdc   0.00 0.00   63.000.00  7760.00 0.00
> 246.35 0.152.162.160.00   1.56   9.80
> sda   0.00 0.00   61.50   47.00  7600.00 22424.00
> 553.44 2.77   25.572.63   55.57   3.82  41.40
> nvme0n1   0.0022.502.00 2605.00 8.00 139638.00
> 107.13 0.140.050.000.05   0.03   6.60
> sdg   0.00 0.00   61.00   28.00  6230.00 12696.00
> 425.30 3.66   74.795.84  225.00   3.87  34.40
> sdf   0.00 0.00   34.50   47.00  4108.00 21702.00
> 633.37 3.56   43.751.51   74.77   2.85  23.20
> sdh   0.00 0.00   75.00   15.50  9180.00  4984.00
> 313.02 0.45   12.553.28   57.42   3.51  31.80
> sdi   1.50 0.50  142.00   48.50 18102.00 21924.00
> 420.22 3.60   18.924.99   59.71   2.70  51.40
> sdj   0.50 0.00   74.505.00  9362.00  1832.00
> 281.61 0.334.103.33   15.60   2.44  19.40
> sdk   0.00 0.00   54.000.00  6420.00 0.00
> 237.78 0.122.302.300.00   1.70   9.20
> sdl   0.00 0.00   21.001.50  2286.0016.00
> 204.62 0.32   18.13   13.81   78.67   6.67  15.00
> sde   0.00 0.00   98.000.00 12304.00 0.00
> 251.10 0.303.103.100.00   2.08  20.40
> 
Is that a live sample from iostat or the initial/one-shot summary?

> 50us latency at 2605 iops!!!
>
At less than 5% IOPS or 14% bandwidth capacity running more than twice as
slow than the spec sheet says. ^o^
Fast, very much so. But not mindnumbingly so. 

The real question here is, how much of that latency improvement do you see
in the Ceph clients, VMs?

I'd venture not so much, given that most latency happens in Ceph.

That all said, I'd go for a similar setup as well, if I had a dozen
storage nodes or more. 
But at my current cluster sizes that's too many eggs in one basket for me.
My "largest" cluster is now up to node 5, going from 4 journal SSDs for 8
HDDs to 2 journal SSDs for 12 HDDs. Woo-Woo!

> Compared to one of the other nodes with 2 100GB S3700's, 6 disks each
> 
Well, that's not really fair, is it?

Those SSDs have a 5 times lower bandwidth, triple the write latency and the
SATA bus instead of the PCIe zipway when compared to the smallest P 3700.
 
And 6 disk are a bit much for that SSD, 4 would be pushing it.
Whereas 12 HDDs for the P model are a good match, overkill really.

Incidentally the NVMes also are 5 times more power hungry than the SSDs,
must be the PCIe stuff.

Christian

> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda   0.0030.500.00  894.50 0.00 50082.00
> 111.98 0.360.410.000.41   0.20  17.80
> sdb   0.00 9.000.00  551.00 0.00 32044.00
> 116.31 0.230.420.000.42   0.19  10.40
> sdc   0.00 2.006.50   17.50   278.00  8422.00
> 725.00 1.08   44.92   18.46   54.74   8.08  19.40
> sdd   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sde   0.00 2.50   27.50   21.50  2112.00  9866.00
> 488.90 0.59   12.046.91   18.60   6.53  32.00
> sdf   0.50 0.00   50.500.00  6170.00 0.00
> 244.36 0.184.634.630.00   2.10  10.60
> md1   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> md0   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sdg   0.00 1.50   32.00  386.50  3970.00 12188.00
> 77.22 0.150.350.500.34   0.15   6.40
> sdh   0.00 0.006.000.0034.00 0.00
> 11.33 0.07   12.67   12.670.00  11.00   6.60
> sdi   0.00 0.501.50   19.50 6.00  8862.00
> 844.57 0.96   45.71   33.33   46.67   6.57  13.80
> sdj   0.00 0.00   67.000.00  8214.00 0.00
> 245.19 0.172.512.510.00   1.88  12.60
> sdk   1.50 2.50   61.00   48.00  6216.00 21020.00
> 499.74 2.01   18.46   11.41   27.42   5.05  55.00
> sdm   0.00 0.00   30.500.00  3576.00

[ceph-users] Calamari doesn't detect a running cluster despite of connected ceph servers

2016-07-07 Thread Pieroth.N

Hello,
i know there are lots of users with the same problem and if this is the wrong
mailinglist, please tell me. I've tried to fix this but this calamari is
driving me nuts.

Problem: Running Ceph Cluster with healty state.
calamari gui once tried to connect the ceph-nodes, but after the 120 Sec.
timeout there comes the message no cluster created yet. After changing
software versions and configs for long hours I'm close to give up.

Actual State:
ceph seems ok (rudimentary installation with 3 nodes). The nodes are running
osd's and mon's. One Admin node with calamari and salt-master.

cluster 10c29f99-caf8-4057-8cc7-1f94359418f2
 health HEALTH_OK
 monmap e1: 3 mons at 
{wmaiz-feink06=172.23.65.26:6789/0,wmaiz-feink07=172.23.65.27:6789/0,wmaiz-feink08=172.23.65.28:6789/0}
election epoch 34, quorum 0,1,2 
wmaiz-feink06,wmaiz-feink07,wmaiz-feink08
 osdmap e66: 3 osds: 3 up, 3 in
flags sortbitwise
  pgmap v265: 64 pgs, 1 pools, 0 bytes data, 0 objects
102 MB used, 698 GB / 698 GB avail
  64 active+clean
Software:
Ubuntu 14.04 with standard Kernel 3.13
ceph: 10.2.2-1trusty
salt: 2014.1.13+ds-1trusty1 (minions and masters with the same version)
diamond: 3.4.67
libgraphite: 1.3.6-1ubuntu0.14.04.1
calamari: 1.3.1.1-1trusty

I tried the also with the actual Ubuntu Versions of salt, but in the ceph-docs
is mentioned that salt 2014.7. I also tried 2014.7 but this is also not working.
Now I'm running a version below that.

Salt-keys are accepted:
Accepted Keys:
wmaiz-feink05.dbc.zdf.de
wmaiz-feink06.dbc.zdf.de
wmaiz-feink07.dbc.zdf.de
wmaiz-feink08.dbc.zdf.de
Unaccepted Keys:
Rejected Keys:

There are keys from the minion on the calamarihost, but this should not be the 
problem
I think. I could deinstall or deactivate the minion and the errors are the same.

root@wmaiz-feink05:/home/deploy# salt \* test.ping
wmaiz-feink07.dbc.zdf.de:
True
wmaiz-feink08.dbc.zdf.de:
True
wmaiz-feink06.dbc.zdf.de:
True
wmaiz-feink05.dbc.zdf.de:
True

salt \* ceph.get_hearteats shows something like this:

cluster_heartbeat[fsid] = cluster_status(cluster_handle, 
fsid_names[fsid])
  File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in 
cluster_status
mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
wmaiz-feink08.dbc.zdf.de:
The minion function caused an exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 809, in 
_thread_return
return_data = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/ceph.py", line 498, in 
get_heartbeats
cluster_heartbeat[fsid] = cluster_status(cluster_handle, 
fsid_names[fsid])
  File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in 
cluster_status
mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'
wmaiz-feink07.dbc.zdf.de:
The minion function caused an exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 809, in 
_thread_return
return_data = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/ceph.py", line 498, in 
get_heartbeats
cluster_heartbeat[fsid] = cluster_status(cluster_handle, 
fsid_names[fsid])
  File "/var/cache/salt/minion/extmods/modules/ceph.py", line 566, in 
cluster_status
mds_epoch = status['mdsmap']['epoch']
KeyError: 'mdsmap'

which results from an exception of the salt-minions on the ceph nodes.
Depending on the salt-version I'm running it's nearly always the problem that
the Minions are throwing exceptions or other pyhton errors are popping up 
(unfortunately I
don't have the Error-Dumps cause I reinstalled the scenario so often).
I've set the master :  in /etc/salt/minion.d/calamari.conf.
I've set the ceph-deploy-calamari master also to the calamarihost.
The REST Api shows only the calamarihost on running 
wmaiz-feink05.dbc.zdf.de/v2/api/server section.
When I try to get Infos including the cluster fsid i get errors (404) with the 
hint that
the cluster with this FSID is not found.

Config:

ceph.conf:
[global]
fsid = 10c29f99-caf8-4057-8cc7-1f94359418f2
mon_initial_members = wmaiz-feink06, wmaiz-feink07, wmaiz-feink08
mon_host = 172.23.65.26,172.23.65.27,172.23.65.28
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_defautl_size = 2
public_network = 172.23.65.0/24

Could it be s.th. with the Authentification ? Perhaps deactivating cephx ?
I don't know where to look right now. Help is appreciated...

kind regards
pir
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] layer3 network

2016-07-07 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Matyas Koszik
> Sent: 07 July 2016 11:26
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] layer3 network
> 
> 
> 
> Hi,
> 
> My setup uses a layer3 network, where each node has two connections (/31s), 
> equipped with a loopback address and redundancy is
> provided via OSPF. In this setup it is important to use the loopback address 
> as source for outgoing connections, since the
interface
> addresses are not protected from failure, but the loopback address is.
> 
> So I set the public addr and the cluster addr to the desired ip, but it seems 
> that the outgoing connections do not use this as the
source
> address.
> I'm using jewel; is this the expected behavior?

Do your public/cluster networks overlap the physical connection IP's? From what 
I understand Ceph binds to the interface whose IP
lies within the range specified in the conf file.

So for example if public addr = 192.168.1.0/24

Then your loopback should be in that range, but you must make sure the physical 
nics lie outside this range.

I'm following this with interest as I am about to deploy something very similar.

> 
> Matyas
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread George Shuklin


The are two problems I found so far:

1) You can not alter parition table if it is in the use. That means you 
need to stop all ceph-osd who use journals on given OSD to change 
anything on it. Worse: you can change, but you can not force kernel to 
reread partition table.
2) I found udev bug with 5th and more partion detection. Basically, 
after you create 4 GPT-based parition, and after that create 5th, UDEV 
do not create /dev/sdx5 (6, 7, so on).
3) When I've tried to automate this process (OSD creation) with ansible, 
I found that it very prone to time-based errors, like 'partition busy', 
or 'too many partition created in raw and not every one is visible at 
next stage). Worse: even if I add blockdev --rereadpt stage, it fails 
with 'device busy' message. I spend whole day trying to do it right, but 
at the end of the day it was still '50% of fail' when creating 8+ OSD in 
a row. (And I can't do it 'one by one' - see para. 1)


On  the next day I remade playbook on LVM. It takes just 1 hr (with 
debug) and it works perfectly - no a single race condition. And whole 
playbook shrinks ~3 times:


All steps:
- Configure udev to change LV owner to ceph
- Create volume group for journals
- Create logical volumes for journals
- Create data partition
- Create XFS filesystem
- Create directory
- temporal mount
- chown for directory
- Create OSD filesystem
- Create symlink for journal
- Add OSD to ceph
- Add auth in ceph
- unmount temp. mount
- Activate OSD via GPT

And that's all.

About performance issue for LVM. I think it is negligible (if we will 
not play with copy-on-write snapshots and other strange things). For 
HDD-OSD with journals on SSD main concern is not IOPS or latency on the 
journal (HDD will gives big latency anyway), but throughput. Single SSD 
capable for 300-500MB/s of linear writing, and ~10 HDD behind it  can 
gives up to 1.5GB/s.


device mapper is pretty fast thing if it just doing remapping.


On 07/07/2016 05:22 AM, Christian Balzer wrote:

Hello,

I have a multitude of of problems with the benchmarks and conclusions
here, more below.

But firstly to address the question of the OP, definitely not filesystem
based journals.
Another layer of overhead and delays, something I'd be willing to ignore
if we're talking about a full SSD as OSD with an inline journal, but not
with journal SSDs.
Similar with LVM, though with a lower impact.

Partitions really are your best bet.

On Wed, 6 Jul 2016 18:20:43 +0300 George Shuklin wrote:


Yes.

On my lab (not production yet) with 9 7200 SATA (OSD) and one INTEL
SSDSC2BB800G4 (800G, 9 journals)

First and foremost, a DC 3510 with 1 DWPD endurance is not my idea of good
journal device, even if it had the performance.
If you search in the ML archives there is at least one case where somebody
lost a full storage node precisely because their DC S3500s were worn out:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28083.html

Unless you have a read-mostly cluster, a 400GB DC S3610 (same or lower
price) would be a better deal, at 50% more endurance and only slightly
lower sequential write speed.

And depending on your expected write volume (which you should
know/estimate as close as possible before buying HW), a 400GB DC S3710
might be the best deal when it comes to TBW/$.
It's 30% more expensive than your 3510, but has the same speed and an
endurance that's 5 times greater.


during random write I got ~90%
utilization of 9 HDD with ~5% utilization of SSD (2.4k IOPS). With
linear writing it somehow worse: I got 250Mb/s on SSD, which translated
to 240Mb of all OSD combined.


This test shows us a lot of things, mostly the failings of filestore.
But only partially if a SSD is a good fit for journals or not.

How are you measuring these things on the storage node, iostat, atop?
At 250MB/s (Mb would be mega-bit) your 800 GB DC S3500 should register
about/over 50% utilization, given that its top speed is 460MB/s.

With Intel DC SSDs you can pretty much take the sequential write speed
from their specifications page and roughly expect that to be the speed of
your journal.

For example a 100GB DC S3700 (200MB/s) doing journaling for 2 plain SATA
HDDs will give us this when running "ceph tell osd.nn bench" in
parallel against 2 OSDs that share a journal SSD:
---
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdd   0.00 2.000.00  409.50 0.00 191370.75   934.66   
146.52  356.460.00  356.46   2.44 100.00
sdl   0.0085.500.50  120.50 2.00 49614.00   820.10 
2.25   18.510.00   18.59   8.20  99.20
sdk   0.0089.501.50  119.00 6.00 49348.00   819.15 
2.04   16.910.00   17.13   8.23  99.20
---

Where sdd is the journal SSD and sdl/sdk are the OSD HDDs.
And the SSD is nearly at 200MB/s (and 100%).
For the record, that bench command is good for testing, but the result:
---
# ceph tell osd.30 bench

[ceph-users] layer3 network

2016-07-07 Thread Matyas Koszik



Hi,

My setup uses a layer3 network, where each node has two connections
(/31s), equipped with a loopback address and redundancy is provided via
OSPF. In this setup it is important to use the loopback address as source
for outgoing connections, since the interface addresses are not protected
from failure, but the loopback address is.

So I set the public addr and the cluster addr to the desired ip, but it
seems that the outgoing connections do not use this as the source address.
I'm using jewel; is this the expected behavior?

Matyas


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multiple journals on SSD

2016-07-07 Thread Nick Fisk

Just to add if you really want to go with lots of HDD's to Journals then go
NVME. They are not a lot more expensive than the equivalent SATA based
3700's, but the latency is low low low. Here is an example of a node I have
just commissioned with 12 HDD's to one P3700

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.00   68.000.00  8210.00 0.00   241.47
0.263.853.850.00   2.09  14.20
sdd   2.50 0.00  198.50   22.00 24938.00  9422.00   311.66
4.34   27.806.21  222.64   2.45  54.00
sdc   0.00 0.00   63.000.00  7760.00 0.00   246.35
0.152.162.160.00   1.56   9.80
sda   0.00 0.00   61.50   47.00  7600.00 22424.00   553.44
2.77   25.572.63   55.57   3.82  41.40
nvme0n1   0.0022.502.00 2605.00 8.00 139638.00   107.13
0.140.050.000.05   0.03   6.60
sdg   0.00 0.00   61.00   28.00  6230.00 12696.00   425.30
3.66   74.795.84  225.00   3.87  34.40
sdf   0.00 0.00   34.50   47.00  4108.00 21702.00   633.37
3.56   43.751.51   74.77   2.85  23.20
sdh   0.00 0.00   75.00   15.50  9180.00  4984.00   313.02
0.45   12.553.28   57.42   3.51  31.80
sdi   1.50 0.50  142.00   48.50 18102.00 21924.00   420.22
3.60   18.924.99   59.71   2.70  51.40
sdj   0.50 0.00   74.505.00  9362.00  1832.00   281.61
0.334.103.33   15.60   2.44  19.40
sdk   0.00 0.00   54.000.00  6420.00 0.00   237.78
0.122.302.300.00   1.70   9.20
sdl   0.00 0.00   21.001.50  2286.0016.00   204.62
0.32   18.13   13.81   78.67   6.67  15.00
sde   0.00 0.00   98.000.00 12304.00 0.00   251.10
0.303.103.100.00   2.08  20.40

50us latency at 2605 iops!!!

Compared to one of the other nodes with 2 100GB S3700's, 6 disks each

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.0030.500.00  894.50 0.00 50082.00   111.98
0.360.410.000.41   0.20  17.80
sdb   0.00 9.000.00  551.00 0.00 32044.00   116.31
0.230.420.000.42   0.19  10.40
sdc   0.00 2.006.50   17.50   278.00  8422.00   725.00
1.08   44.92   18.46   54.74   8.08  19.40
sdd   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sde   0.00 2.50   27.50   21.50  2112.00  9866.00   488.90
0.59   12.046.91   18.60   6.53  32.00
sdf   0.50 0.00   50.500.00  6170.00 0.00   244.36
0.184.634.630.00   2.10  10.60
md1   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
md0   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdg   0.00 1.50   32.00  386.50  3970.00 12188.0077.22
0.150.350.500.34   0.15   6.40
sdh   0.00 0.006.000.0034.00 0.0011.33
0.07   12.67   12.670.00  11.00   6.60
sdi   0.00 0.501.50   19.50 6.00  8862.00   844.57
0.96   45.71   33.33   46.67   6.57  13.80
sdj   0.00 0.00   67.000.00  8214.00 0.00   245.19
0.172.512.510.00   1.88  12.60
sdk   1.50 2.50   61.00   48.00  6216.00 21020.00   499.74
2.01   18.46   11.41   27.42   5.05  55.00
sdm   0.00 0.00   30.500.00  3576.00 0.00   234.49
0.072.432.430.00   1.90   5.80
sdl   0.00 4.50   25.00   23.50  2092.00 12648.00   607.84
1.36   19.425.60   34.13   4.04  19.60
sdn   0.50 0.00   23.000.00  2670.00 0.00   232.17
0.072.962.960.00   2.43   5.60

Pretty much 10x the latency. I'm seriously impressed with these NVME things.


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer
> Sent: 07 July 2016 03:23
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] multiple journals on SSD
> 
> 
> Hello,
> 
> I have a multitude of of problems with the benchmarks and conclusions
here,
> more below.
> 
> But firstly to address the question of the OP, definitely not filesystem
based
> journals.
> Another layer of overhead and delays, something I'd be willing to ignore
if
> we're talking about a full SSD as OSD with an inline journal, but not with
> journal SSDs.
> Similar with LVM, though with a lower impact.
> 
> Partitions really are your best bet.
> 
> On Wed, 6 Jul 2016 18:20:43 +0300 George Shuklin wrote:
> 
> > Yes.
> >
> > On my lab (not production yet) with 9 7200 SATA (OSD) and one INTEL
> > SSDSC2BB800G4 (800G, 9

[ceph-users] what's the meaning of 'removed_snaps' of `ceph osd pool ls detail`?

2016-07-07 Thread ????

Hi,All:)


i have made a cache-tier, 
but i do not know message 'removed_snaps 
[1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]'.
i have not snapped any thing yet.




ceph> osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'volumes' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 11365 lfor 11365 flags hashpspool 
tiers 3 read_tier 3 write_tier 3 stripe_width 0
removed_snaps 
[1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]
pool 2 'test' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 100 pgp_num 100 last_change 2779 flags hashpspool stripe_width 0
pool 3 'fast' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 128 pgp_num 128 last_change 11376 flags hashpspool,incomplete_clones 
tier_of 1 cache_mode writeback target_bytes 1500 hit_set 
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 
min_read_recency_for_promote 1 stripe_width 0
removed_snaps 
[1~1,3~6,b~6,13~c,21~4,26~1,28~1a,4e~4,53~5,5c~5,63~1,65~4,6b~4]


Regards,
XiuCai.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] (no subject)

2016-07-07 Thread Kees Meijs

Hi Gaurav,

Unfortunately I'm not completely sure about your setup, but I guess it
makes sense to configure Cinder and Glance to use RBD for a backend. It
seems to me, you're trying to store VM images directly on an OSD filesystem.

Please refer to http://docs.ceph.com/docs/master/rbd/rbd-openstack/ for
details.

Regards,
Kees

On 06-07-16 23:03, Gaurav Goyal wrote:
>
> I am installing ceph hammer and integrating it with openstack Liberty
> for the first time.
>
> My local disk has only 500 GB but i need to create 600 GB VM. SO i
> have created a soft link to ceph filesystem as
>
> lrwxrwxrwx 1 root root 34 Jul 6 13:02 instances ->
> /var/lib/ceph/osd/ceph-0/instances [root@OSKVM1 nova]# pwd
> /var/lib/nova [root@OSKVM1 nova]#
>
> now when i am trying to create an instance it is giving the following
> error as checked from nova-compute.log
> I need your help to fix this issue.
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can't configure ceph with dpdk

2016-07-07 Thread 席智勇

very appreciate~

2016-07-07 14:18 GMT+08:00 Haomai Wang :

> Previously dpdk plugin only support cmake.
>
> Currently I'm working on split that PR into multi clean PR to let
> merge. So previous PR isn't on my work list. plz move on the following
> changes
>
> On Thu, Jul 7, 2016 at 1:25 PM, 席智勇  wrote:
> > I copy rte_config.h to /usr/include/ and it can pass the ./configure,
> when
> > did 'make', meet the error of these:
> >
> >   CXXLDlibcommon_crc.la
> > ../libtool: line 6000: cd: yes/lib: No such file or directory
> > libtool: link: cannot determine absolute directory name of `yes/lib'
> > Makefile:13645: recipe for target 'libcommon_crc.la' failed
> > make[3]: *** [libcommon_crc.la] Error 1
> > make[3]: *** Waiting for unfinished jobs
> >
> >
> >
> > 2016-07-07 9:04 GMT+08:00 席智勇 :
> >>
> >> Hi haomai:
> >>
> >> I noticed your PR about support DPDK by Ceph:
> >>
> >> https://github.com/ceph/ceph/pull/9230
> >>
> >> It's great job for Ceph.
> >>
> >> I want to do some test base on the PR, but can not use it still.First I
> >> can not find the package for dpdk on debian/ubuntu, So I download the
> source
> >> code of dpdk and compile it, and I can run the example within DPDK
> although
> >> there has been some problem. Then I compile Ceph, configure it with
> option
> >> '--with-dpdk', it aways report some errors about can not find header
> files
> >> like 'rte_config.h', actually it is in '/usr/include/dpdk/'.
> >>
> >> Whether it is the problem of compile and install of dpdk,so ceph cannot
> >> find it or problem about ceph with dpdk.
> >>
> >> Looking forward your tips or advice by anyone else.
> >>
> >>
> >> best regards~
> >>
> >> zhiyong
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can't configure ceph with dpdk

2016-07-07 Thread Haomai Wang

Previously dpdk plugin only support cmake.

Currently I'm working on split that PR into multi clean PR to let
merge. So previous PR isn't on my work list. plz move on the following
changes

On Thu, Jul 7, 2016 at 1:25 PM, 席智勇  wrote:
> I copy rte_config.h to /usr/include/ and it can pass the ./configure, when
> did 'make', meet the error of these:
>
>   CXXLDlibcommon_crc.la
> ../libtool: line 6000: cd: yes/lib: No such file or directory
> libtool: link: cannot determine absolute directory name of `yes/lib'
> Makefile:13645: recipe for target 'libcommon_crc.la' failed
> make[3]: *** [libcommon_crc.la] Error 1
> make[3]: *** Waiting for unfinished jobs
>
>
>
> 2016-07-07 9:04 GMT+08:00 席智勇 :
>>
>> Hi haomai:
>>
>> I noticed your PR about support DPDK by Ceph:
>>
>> https://github.com/ceph/ceph/pull/9230
>>
>> It's great job for Ceph.
>>
>> I want to do some test base on the PR, but can not use it still.First I
>> can not find the package for dpdk on debian/ubuntu, So I download the source
>> code of dpdk and compile it, and I can run the example within DPDK although
>> there has been some problem. Then I compile Ceph, configure it with option
>> '--with-dpdk', it aways report some errors about can not find header files
>> like 'rte_config.h', actually it is in '/usr/include/dpdk/'.
>>
>> Whether it is the problem of compile and install of dpdk,so ceph cannot
>> find it or problem about ceph with dpdk.
>>
>> Looking forward your tips or advice by anyone else.
>>
>>
>> best regards~
>>
>> zhiyong
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Goncalo Borges

My previous email did not go through because of its size. Here goes a 
new attempt:


Cheers
Goncalo

--- * ---

Hi Patrick, Brad...

Unfortunately, the other user application breaks ceph-fuse again (It is 
a completely different application then in my previous test).


We have tested it in 4 machines with 4 cores. The user is submitting 16 
single core jobs which are all writing different output files (one per 
job) to a common dir in cephfs. The first 4 jobs run happily and never 
break ceph-fuse. But the remaining 12 jobs, running in the remaining 3 
machines, trigger a segmentation fault, which is completely different 
from the other case.


ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x297fe2) [0x7f54402b7fe2]
2: (()+0xf7e0) [0x7f543ecf77e0]
3: 
(ObjectCacher::bh_write_scattered(std::list >&)+0x36) [0x7f5440268086]
4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*, 
std::chrono::time_point >, 
long*, int*)+0x22c) [0x7f5440268a3c]

5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
8: (()+0x7aa1) [0x7f543ecefaa1]
 9: (clone()+0x6d) [0x7f543df6893d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed 
to interpret this.


The full log (with debug client = 20) for a segfault in client with IP  
Y.Y.Y.255 is available here


https://dl.dropboxusercontent.com/u/2946024/nohup.out.2

(for privacy issues, I've substituted clients IPs for Y.Y.Y.(...) and 
ceph infrastructure hosts ips for X.X.X.(...) )


Welll... further help is welcomed.

Cheers
Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

51 matches

Mail list logo