Re: [ceph-users] How to monitor health and connectivity of OSD

2016-02-08 Thread Gregory Farnum
On Mon, Feb 8, 2016 at 3:25 AM, Mariusz Gronczewski
 wrote:
> Is there an equivalent of 'ceph health' but for OSD ?
>
> Like warning about slowness or troubles with communication between OSDs?
>
> I've spent good amount of time debugging what looked like stuck pgs
> only but it turned out to be bad NIC and it was only apparent once I
> saw some OSD logs like
>
> 2016-02-08 03:42:27.810289 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
> reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
> 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
> 2016-02-08 03:42:27.810297 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
> reply from osd.15 ever on either front or back, first ping sent 2016-02-08 
> 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
> 2016-02-08 03:42:28.311125 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
> reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
> 03:39:24.860852 (cutoff 2016-02-08 03:39:28.311124)
>
> (turned out to be bad nic, fuck emulex)
>
> is there anything that could dump things like "failed heartbeats in
> last 10 minutes"  or similiar stats ?

I don't think that's exposed anywhere — if it happens enough then the
OSD will get killed. We could maybe add some tracking structures and
an admin socket command to dump them from the OSD; you should create a
feature request at tracker.ceph.com. :)
-Greg

>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczew...@efigence.com
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-08 Thread Gregory Farnum
On Fri, Feb 5, 2016 at 10:19 PM, Michael Metz-Martini | SpeedPartner
GmbH  wrote:
> Hi,
>
> Am 06.02.2016 um 07:15 schrieb Yan, Zheng:
>>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
>> GmbH  wrote:
>>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
 On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | 
 SpeedPartner
> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
> 62.125785 secs
> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
> 14:41:23.455812: client_request(client.10199855:1313157 getattr
> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
> rdlock, waiting
 This seems like dirty page writeback is too slow.  Is there any hung 
 OSD request in /sys/kernel/debug/ceph/xxx/osdc?
> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
 That’s quite a lot requests. Could you pick some requests in osdc, and 
 check how long do these requests last.
>>> After stopping load/access to cephfs there are a few requests left:
>>> 330 osd87   5.72c3bf71  100826d5cdc.0002write
>>> 508 osd87   5.569ad068  100826d5d18.write
>>> 668 osd87   5.3db54b00  100826d5d4d.0001write
>>> 799 osd87   5.65f8c4e0  100826d5d79.write
>>> 874 osd87   5.d238da71  100826d5d98.write
>>> 1023osd87   5.705950e0  100826d5e2d.write
>>> 1277osd87   5.33673f71  100826d5f2a.write
>>> 1329osd87   5.e81ab868  100826d5f5e.write
>>> 1392osd87   5.aea1c771  100826d5f9c.write
>>>
>>> osd.87 is near full and currently has some pg's with backfill_toofull
>>> but can this be the reason for this?
>>
>> Yes, it’s likely.
> But "why"?
> I thought that reads/writes are still possible but not replicated /
> objects are degraded.

As long as all the PGs are "active" they'll still accept reads/writes,
but it's possible that osd 87 is just so busy that the clients are all
stuck waiting for it.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Need help on benchmarking new erasure coding

2016-02-08 Thread Syed Hussain
Hi,

I've been developing a new array type of erasure code.

I'll be glad if you can send me few pointers for the two following items:
(1) Required CRUSH map for Array code, e.g. RAID-DP or MSR erasure code.
It is different than normal RS(n, k) or LRC code. For example, for RAID-DP
or RDP (n, k) erasure code, the codeword has k*(k+m) blocks. Question is
how to place these blocks in n device mapped as OSDs?
Could you Pl. suggests...

(2) How to measure the recovery time of a failed OSD in cluster. Is there
any script available
that will perform it and generate the final result?

Thanks,
Syed Abid Hussain
NetApp India
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Increasing time to save RGW objects

2016-02-08 Thread Kris Jurka


I've been testing the performance of ceph by storing objects through 
RGW.  This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 
4 RGW instances.  Initially the storage time was holding reasonably 
steady, but it has started to rise recently as shown in the attached chart.


The test repeatedly saves 100k objects of 55 kB size using multiple 
threads (50) against multiple RGW gateways (4).  It uses a sequential 
identifier as the object key and shards the bucket name using id % 100. 
 The buckets have index sharding enabled with 64 index shards per bucket.


ceph status doesn't appear to show any issues.  Is there something I 
should be looking at here?



# ceph status
cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
 health HEALTH_OK
 monmap e2: 5 mons at 
{condor=192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0}
election epoch 18, quorum 0,1,2,3,4 
condor,eagle,falcon,shark,duck

 osdmap e674: 40 osds: 40 up, 40 in
  pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
4784 GB used, 69499 GB / 74284 GB avail
3128 active+clean
  client io 268 kB/s rd, 1100 kB/s wr, 493 op/s


Kris Jurka
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CentOS 7 iscsi gateway using lrbd

2016-02-08 Thread Nick Fisk
Hi Mike,

Thanks for the update. I will keep a keen eye on the progress. Once you get to 
the point you think you have fixed the stability problems, let me know if you 
need somebody to help test.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mike Christie
> Sent: 21 January 2016 03:12
> To: Nick Fisk ; 'Василий Ангапов' ;
> 'Ilya Dryomov' 
> Cc: 'Dominik Zalewski' ; 'ceph-users'  us...@lists.ceph.com>
> Subject: Re: [ceph-users] CentOS 7 iscsi gateway using lrbd
> 
> On 01/20/2016 06:07 AM, Nick Fisk wrote:
> > Thanks for your input Mike, a couple of questions if I may
> >
> > 1. Are you saying that this rbd backing store is not in mainline and is 
> > only in
> SUSE kernels? Ie can I use this lrbd on Debian/Ubuntu/CentOS?
> 
> The target_core_rbd backing store is not upstream and only in SUSE kernels.
> 
> lrbd is the management tool that basically distributes the configuration info
> to the nodes you want to run LIO on. In that README you see it uses the
> target_core_rbd module by default, but last I looked there is code to support
> iblock too. So you should be able to use this with other distros that do not
> have target_core_rbd.
> 
> When I was done porting my code to a iblock based approach I was going to
> test out the lrbd iblock support and fix it up if it needed anything.
> 
> > 2. Does this have any positive effect on the abort/reset death loop a
> number of us were seeing when using LIO+krbd and ESXi?
> 
> The old code and my new approach does not really help. However, on
> Monday, Ilya and I were talking about this problem, and he gave me some
> hints on how to add code to cancel/cleanup commands so we will be able to
> handle aborts/resets properly and so we will not fall into that problem.
> 
> 
> > 3. Can you still use something like bcache over the krbd?
> 
> Not initially. I had been doing active/active across nodes by default, so you
> cannot use bcache and krbd as is like that.
> 
> 
> 
> 
> >
> >
> >
> >> -Original Message-
> >> From: Mike Christie [mailto:mchri...@redhat.com]
> >> Sent: 19 January 2016 21:34
> >> To: Василий Ангапов ; Ilya Dryomov
> >> 
> >> Cc: Nick Fisk ; Tyler Bishop
> >> ; Dominik Zalewski
> >> ; ceph-users 
> >> Subject: Re: [ceph-users] CentOS 7 iscsi gateway using lrbd
> >>
> >> Everyone is right - sort of :)
> >>
> >> It is that target_core_rbd module that I made that was rejected
> >> upstream, along with modifications from SUSE which added persistent
> >> reservations support. I also made some modifications to rbd so
> >> target_core_rbd and krbd could share code. target_core_rbd uses rbd
> >> like a lib. And it is also modifications to the targetcli related
> >> tool and libs, so you can use them to control the new rbd backend.
> >> SUSE's lrbd then handles setup/management of across multiple
> targets/gatways.
> >>
> >> I was going to modify targetcli more and have the user just pass in
> >> the rbd info there, but did not get finished. That is why in that
> >> suse stuff you still make the krbd device like normal. You then pass
> >> that to the target_core_rbd module with targetcli and that is how
> >> that module knows about the rbd device.
> >>
> >> The target_core_rbd module was rejected upstream, so I stopped
> >> development and am working on the approach suggested by those
> >> reviewers which instead of going from lio->target_core_rbd->krbd goes
> >> lio->target_core_iblock->linux block layer->krbd. With this approach
> >> lio->you
> >> just use the normal old iblock driver and krbd and then I am
> >> modifying them to just work and do the right thing.
> >>
> >>
> >> On 01/19/2016 05:45 AM, Василий Ангапов wrote:
> >>> So is it a different approach that was used here by Mike Christie:
> >>> http://www.spinics.net/lists/target-devel/msg10330.html ?
> >>> It seems to be a confusion because it also implements
> >>> target_core_rbd module. Or not?
> >>>
> >>> 2016-01-19 18:01 GMT+08:00 Ilya Dryomov :
>  On Tue, Jan 19, 2016 at 10:34 AM, Nick Fisk  wrote:
> > But interestingly enough, if you look down to where they run the
> >> targetcli ls, it shows a RBD backing store.
> >
> > Maybe it's using the krbd driver to actually do the Ceph side of
> > the
> >> communication, but lio plugs into this rather than just talking to a
> >> dumb block device???
> 
>  It does use krbd driver.
> 
>  Thanks,
> 
>  Ilya
> >
> >
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list

[ceph-users] How to monitor health and connectivity of OSD

2016-02-08 Thread Mariusz Gronczewski
Is there an equivalent of 'ceph health' but for OSD ?

Like warning about slowness or troubles with communication between OSDs?

I've spent good amount of time debugging what looked like stuck pgs
only but it turned out to be bad NIC and it was only apparent once I
saw some OSD logs like

2016-02-08 03:42:27.810289 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:27.810297 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.15 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:28.311125 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:28.311124)

(turned out to be bad nic, fuck emulex)

is there anything that could dump things like "failed heartbeats in
last 10 minutes"  or similiar stats ?

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com



pgpm9EJE00Ovh.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?

2016-02-08 Thread Christian Balzer

Hello,

I'm quite concerned by this (and the silence from the devs), however there
are a number of people doing similar things (at least with Hammer) and
you'd think they would have been bitten by this if it were a systemic bug. 

More below.

On Sat, 6 Feb 2016 11:31:51 +0100 Udo Waechter wrote:

> Hello,
> 
> I am experiencing totally weird filesystem corruptions with the
> following setup:
> 
> * Ceph infernalis on Debian8
Hammer here, might be a regression.
> * 10 OSDs (5 hosts) with spinning disks
> * 4 OSDs (1 host, with SSDs)
> 
So you're running your cache tier host with replication of 1, I presume?
What kind of SSDs/FS/other relevant configuration options?
Could there be simply some corruption on the SSDs that is of course then
presented to the RDB clients eventually?

> The SSDs are new in my setup and I am trying to setup a Cache tier.
> 
> Now, with the spinning disks Ceph is running since about a year without
> any major issues. Replacing disks and all that went fine.
> 
> Ceph is used by rbd+libvirt+kvm with
> 
> rbd_cache = true
> rbd_cache_writethrough_until_flush = true
> rbd_cache_size = 128M
> rbd_cache_max_dirty = 96M
> 
> Also, in libvirt, I have
> 
> cachemode=writeback enabled.
> 
> So far so good.
> 
> Now, I've added the SSD-Cache tier to the picture with "cache-mode
> writeback"
> 
> The SSD-Machine also has "deadline" scheduler enabled.
> 
> Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal
> failed".
> Trying to reboot the machines ends in "No bootable drive"
> Using parted and testdisk on the image mapped via rbd reveals that the
> partition table is gone.
> 
Did turning the cache explicitly off (both Ceph and qemu) fix this?

> testdisk finds the proper ones, e2fsck repairs the filesystem beyond
> usage afterwards.
> 
> This does not happen to all machines, It happens to those that actually
> do some or most fo the IO
> 
> elasticsearch, MariaDB+Galera, postgres, backup, GIT
> 
> So I thought, yesterday one of my ldap-servers died, and that one is not
> doing IO.
> 
> Could it be that rbd caching + qemu writeback cache + ceph cach tier
> writeback are not playing well together?
> 
> I've read through some older mails on the list, where people had similar
> problems and suspected somehting like that.
> 
Any particular references (URLs, Message-IDs)?

Regards,

Christian

> What are the proper/right settings for rdb/qemu/libvirt?
> 
> libvirt: cachemode=none (writeback?)
> rdb: cache_mode = none
> SSD-tier: cachemode: writeback
> 
> ?
> 
> Thanks for any help,
> udo.
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Tips for faster openstack instance boot

2016-02-08 Thread Vickey Singh
Hello Community

I need some guidance how can i reduce openstack instance boot time using
Ceph

We are using Ceph Storage with openstack ( cinder, glance and nova ). All
OpenStack images and instances are being stored on Ceph in different pools
glance and nova pool respectively.

I assume that Ceph by default uses COW rbd , so for example if an instance
is launched using glance image (which is stored on Ceph) , Ceph should take
COW snapshot of glance image and map it as RBD disk for instance. And this
whole process should be very quick.

In our case , the instance launch is taking 90 seconds. Is this normal ? (
i know this really depends one's infra , but still )

Is there any way , i can utilize Ceph's power and can launch instances ever
faster.

- From Ceph point of view. does COW works cross pool i.e. image from glance
pool ---> (cow) --> instance disk on nova pool
- Will a single pool for glance and nova instead of separate pool . will
help here ?
- Is there any tunable parameter from Ceph or OpenStack side that should be
set ?

Regards
Vickey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing time to save RGW objects

2016-02-08 Thread Gregory Farnum
On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka  wrote:
>
> I've been testing the performance of ceph by storing objects through RGW.
> This is on Debian with Hammer using 40 magnetic OSDs, 5 mons, and 4 RGW
> instances.  Initially the storage time was holding reasonably steady, but it
> has started to rise recently as shown in the attached chart.
>
> The test repeatedly saves 100k objects of 55 kB size using multiple threads
> (50) against multiple RGW gateways (4).  It uses a sequential identifier as
> the object key and shards the bucket name using id % 100.  The buckets have
> index sharding enabled with 64 index shards per bucket.
>
> ceph status doesn't appear to show any issues.  Is there something I should
> be looking at here?
>
>
> # ceph status
> cluster 3fc86d01-cf9c-4bed-b130-7a53d7997964
>  health HEALTH_OK
>  monmap e2: 5 mons at
> {condor=192.168.188.90:6789/0,duck=192.168.188.140:6789/0,eagle=192.168.188.100:6789/0,falcon=192.168.188.110:6789/0,shark=192.168.188.118:6789/0}
> election epoch 18, quorum 0,1,2,3,4
> condor,eagle,falcon,shark,duck
>  osdmap e674: 40 osds: 40 up, 40 in
>   pgmap v258756: 3128 pgs, 10 pools, 1392 GB data, 27282 kobjects
> 4784 GB used, 69499 GB / 74284 GB avail
> 3128 active+clean
>   client io 268 kB/s rd, 1100 kB/s wr, 493 op/s

It's probably a combination of your bucket indices getting larger and
your PGs getting split into subfolders on the OSDs. If you keep
running tests and things get slower it's the first; if they speed
partway back up again it's the latter.
Other things to check:
* you can look at your OSD stores and how the object files are divvied up.
* you can look at the rgw admin socket and/or logs to see what
operations are the ones taking time
* you can check the dump_historic_ops on the OSDs to see if there are
any notably slow ops
-Greg

>
>
> Kris Jurka
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] plain upgrade hammer to infernalis?

2016-02-08 Thread Gregory Farnum
On Mon, Feb 8, 2016 at 10:00 AM, Dzianis Kahanovich
 wrote:
> I want to know about plain (not systemd, no deployment tools, only own simple
> "start-stop-daemon" scripts under Gentoo) upgrade hammer to infernalis and see
> no recommendations. Can I simple node-by-node mon+mds+osd's restart or need 
> some
> strict behaviour global per-service restart?
>
> PS "setuser match path = /var/lib/ceph/$type/$cluster-$id" added to config.

All the upstream testing in order upgrades the monitors, then the
OSDs, then RGW/clients/MDSes. Other than that you should be good just
restarting processes, yes.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] K is for Kraken

2016-02-08 Thread Sage Weil
I didn't find any other good K names, but I'm not sure anything would top 
kraken anyway, so I didn't look too hard.  :)

For L, the options I found were

luminous (flying squid)
longfin (squid)
long barrel (squid)
liliput (octopus)

Any other suggestions?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] plain upgrade hammer to infernalis?

2016-02-08 Thread Dzianis Kahanovich
I want to know about plain (not systemd, no deployment tools, only own simple
"start-stop-daemon" scripts under Gentoo) upgrade hammer to infernalis and see
no recommendations. Can I simple node-by-node mon+mds+osd's restart or need some
strict behaviour global per-service restart?

PS "setuser match path = /var/lib/ceph/$type/$cluster-$id" added to config.

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-08 Thread Lionel Bouton
Le 08/02/2016 20:09, Robert LeBlanc a écrit :
> Too bad K isn't an LTS. It was be fun to release the Kraken many times.

Kraken is an awesome release name !
How I will miss being able to say/write to our clients that we just
released the Kraken on their infra :-/

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-08 Thread Jeff Bailey
Your glance images need to be raw, also.  A QCOW image will be 
copied/converted.


On 2/8/2016 3:33 PM, Jason Dillaman wrote:

If Nova and Glance are properly configured, it should only require a quick 
clone of the Glance image to create your Nova ephemeral image.  Have you 
double-checked your configuration against the documentation [1]?  What version 
of OpenStack are you using?

To answer your questions:


- From Ceph point of view. does COW works cross pool i.e. image from glance
pool ---> (cow) --> instance disk on nova pool

Yes, cloning copy-on-write images works across pools


- Will a single pool for glance and nova instead of separate pool . will help
here ?

Should be no change -- the creation of the clone is extremely lightweight (add 
the image to a directory, create a couple metadata objects)


- Is there any tunable parameter from Ceph or OpenStack side that should be
set ?

I'd double-check your OpenStack configuration.  Perhaps Glance isn't configured with 
"show_image_direct_url = True", or Glance is configured to cache your RBD 
images, or you have an older OpenStack release that requires patches to fully support 
Nova+RBD.

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-08 Thread Jason Dillaman
If Nova and Glance are properly configured, it should only require a quick 
clone of the Glance image to create your Nova ephemeral image.  Have you 
double-checked your configuration against the documentation [1]?  What version 
of OpenStack are you using?

To answer your questions:

> - From Ceph point of view. does COW works cross pool i.e. image from glance
> pool ---> (cow) --> instance disk on nova pool
Yes, cloning copy-on-write images works across pools

> - Will a single pool for glance and nova instead of separate pool . will help
> here ?
Should be no change -- the creation of the clone is extremely lightweight (add 
the image to a directory, create a couple metadata objects)

> - Is there any tunable parameter from Ceph or OpenStack side that should be
> set ?
I'd double-check your OpenStack configuration.  Perhaps Glance isn't configured 
with "show_image_direct_url = True", or Glance is configured to cache your RBD 
images, or you have an older OpenStack release that requires patches to fully 
support Nova+RBD. 

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/

-- 

Jason Dillaman 


- Original Message - 

> From: "Vickey Singh" 
> To: ceph-users@lists.ceph.com, "ceph-users" 
> Sent: Monday, February 8, 2016 9:10:59 AM
> Subject: [ceph-users] Tips for faster openstack instance boot

> Hello Community

> I need some guidance how can i reduce openstack instance boot time using Ceph

> We are using Ceph Storage with openstack ( cinder, glance and nova ). All
> OpenStack images and instances are being stored on Ceph in different pools
> glance and nova pool respectively.

> I assume that Ceph by default uses COW rbd , so for example if an instance is
> launched using glance image (which is stored on Ceph) , Ceph should take COW
> snapshot of glance image and map it as RBD disk for instance. And this whole
> process should be very quick.

> In our case , the instance launch is taking 90 seconds. Is this normal ? ( i
> know this really depends one's infra , but still )

> Is there any way , i can utilize Ceph's power and can launch instances ever
> faster.

> - From Ceph point of view. does COW works cross pool i.e. image from glance
> pool ---> (cow) --> instance disk on nova pool
> - Will a single pool for glance and nova instead of separate pool . will help
> here ?
> - Is there any tunable parameter from Ceph or OpenStack side that should be
> set ?

> Regards
> Vickey

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-08 Thread Heath Albritton
I'm not sure what's normal, but I'm on Openstack Juno with ceph .94.5 using
separate pools for nova, glance, and cinder.  Takes 16 seconds to start an
instance (el7 minimal).

Everything is on 10GE and I'm using cache tiering, which I'm sure speeds
things up.  Can personally verify that COW is working as I recently killed
my images pool as a result of a bug and user error and had to recreate the
base image and re-associate the VM image with the parent-id of the new base
image before I could get all my VMs working again.

On Mon, Feb 8, 2016 at 6:10 AM, Vickey Singh 
wrote:

> Hello Community
>
> I need some guidance how can i reduce openstack instance boot time using
> Ceph
>
> We are using Ceph Storage with openstack ( cinder, glance and nova ). All
> OpenStack images and instances are being stored on Ceph in different pools
> glance and nova pool respectively.
>
> I assume that Ceph by default uses COW rbd , so for example if an instance
> is launched using glance image (which is stored on Ceph) , Ceph should take
> COW snapshot of glance image and map it as RBD disk for instance. And this
> whole process should be very quick.
>
> In our case , the instance launch is taking 90 seconds. Is this normal ? (
> i know this really depends one's infra , but still )
>
> Is there any way , i can utilize Ceph's power and can launch instances
> ever faster.
>
> - From Ceph point of view. does COW works cross pool i.e. image from
> glance pool ---> (cow) --> instance disk on nova pool
> - Will a single pool for glance and nova instead of separate pool . will
> help here ?
> - Is there any tunable parameter from Ceph or OpenStack side that should
> be set ?
>
> Regards
> Vickey
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-08 Thread Karol Mroz
On Mon, Feb 08, 2016 at 01:36:57PM -0500, Sage Weil wrote:
> I didn't find any other good K names, but I'm not sure anything would top 
> kraken anyway, so I didn't look too hard.  :)
> 
> For L, the options I found were
> 
>   luminous (flying squid)
>   longfin (squid)
>   long barrel (squid)
>   liliput (octopus)

Kraken is awesome.

Perhaps we can add 'Loligo' (https://en.wikipedia.org/wiki/Loligo) to the L 
list?

-- 
Regards,
Karol


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-08 Thread Mark Nelson

I like Luminous. :)

Mark

On 02/08/2016 12:36 PM, Sage Weil wrote:

I didn't find any other good K names, but I'm not sure anything would top
kraken anyway, so I didn't look too hard.  :)

For L, the options I found were

luminous (flying squid)
longfin (squid)
long barrel (squid)
liliput (octopus)

Any other suggestions?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-08 Thread Sage Weil
On Mon, 8 Feb 2016, Karol Mroz wrote:
> On Mon, Feb 08, 2016 at 01:36:57PM -0500, Sage Weil wrote:
> > I didn't find any other good K names, but I'm not sure anything would top 
> > kraken anyway, so I didn't look too hard.  :)
> > 
> > For L, the options I found were
> > 
> > luminous (flying squid)
> > longfin (squid)
> > long barrel (squid)
> > liliput (octopus)
> 
> Kraken is awesome.
> 
> Perhaps we can add 'Loligo' (https://en.wikipedia.org/wiki/Loligo) to the L 
> list?

Yep!

http://pad.ceph.com/p/l
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] K is for Kraken

2016-02-08 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Too bad K isn't an LTS. It was be fun to release the Kraken many times.

I like liliput

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Feb 8, 2016 at 11:36 AM, Sage Weil  wrote:
> I didn't find any other good K names, but I'm not sure anything would top
> kraken anyway, so I didn't look too hard.  :)
>
> For L, the options I found were
>
> luminous (flying squid)
> longfin (squid)
> long barrel (squid)
> liliput (octopus)
>
> Any other suggestions?
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWuOfZCRDmVDuy+mK58QAAyMgP/ian9UzIjU8JNkxgR+9T
nMI99toP9Ud5nqq6IN+niVwNIkaVIpuPACMi5e88UHQMW8qhVZPAnPA1Ogrd
HP67cO/m6SAFFIzOWHyFCNpVzfPzoqL0lijdzLzihTC0d9Cbv+vHTX/jX4q6
HwWDEMOctrrdVqaCXGP2hkuViq+pRZqDZKgG9GeQ5lEY9b7swOEmC/z1P5Me
R/UpKtHfu0QMywY6AWTf2vgwx2RIy1QGLs8Fy++GjsggazZqmmOS0xmefLtl
ImSqCmj+YFlsPBt+lazLtYU+2v5AJThIRkZWUbSR+A1jkotP48fQgSQeJN1V
F6fu/4gLB+FwbLLwaZqYVTrq+hrztxu98SkgyMIwN1t+O5JzcCY0xd56Gemi
f//00qvNjSCmRoILq3MPxnPzoD66RnZvFkhbGCsz0h5F1xJUSa0L7u9x0tUe
5LlwY1Qb6e9UBfP6VYjUwGMTChlvnO2tvKQszxPBmIadrjjlfOJhw+aueHmz
kCSsM+s5LlrlI8e7vlxEdF05R7StLVVGzi8aIx/byjLxKFjNA2ZiIg+IUVJK
GwaV6FQ/B3yRW9WKS+TH1aG7HfdtBWkmcDpy0ofLE5NsckrKL0YKEhJmQSBj
BFSAXKXk+cKbWV/ykN0fLWhJcYmNM/9pZ5d3eBzbqIltBe4OQcVgBUllNE0B
QvQj
=gYt+
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com