Re: [openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

2015-05-12 Thread Josh Durgin

On 05/12/2015 12:06 AM, Arne Wiebalck wrote:

Here’s Dan’s answer for the exact procedure (he replied, but it bounced):


We have two clusters with mons behind two DNS aliases:

  cephmon.cern.ch: production cluster with five mons A, B, C, D, E

  cephmond.cern.ch: testing cluster with five mons X, Y, Z


The procedure was:

  1. Stop mon on host X. Remove from DNS alias cephmond. Remove from mon map.

  2. Stop mon on host A. Remove from DNS alias cephmon. Remove from mon map.

  3. Add mon on host X to cephmon cluster. mkfs the new mon, start the ceph-mon 
process; after quorum add it to the cephmon alias.

  4. Add mon on host A to cephmond cluster. mkfs the new mon, start the 
ceph-mon process; after quorum add it to the cephmond alias.

  5. Repeat for B/Y and C/Z.



In the end, three of the hosts which were previously running cephmon mon’s were 
then running cephmond mon’s. Hence when a client comes with an config pointing 
to an old mon, they get authentication denied and the client stops there — it 
doesn’t try the next IP in the list of mons. As a workaround we moved all the 
cephmond mon’s to port 6790 — this way the Cinder clients failover to one of 
the two cephmon mon’s which have not changed.


Thanks, it all makes sense now.

Josh



On 12 May 2015, at 01:46, Josh Durgin  wrote:


On 05/08/2015 12:41 AM, Arne Wiebalck wrote:

Hi Josh,

In our case adding the monitor hostnames (alias) would have made only a
slight difference:
as we moved the servers to another cluster, the client received an
authorisation failure rather
than a connection failure and did not try to fail over to the next IP in
the list. So, adding the
alias to list would have improved the chances to hit a good monitor, but
it would not have
eliminated the problem.


Could you provide more details on the procedure you followed to move
between clusters? I missed the separate clusters part initially, and
thought you were simply replacing the monitor nodes.


I’m not sure storing IPs in the nova database is a good idea in gerenal.
Replacing (not adding)
these by the hostnames is probably better. Another approach may be to
generate this part of
connection_info (and hence the XML) dynamically from the local ceph.conf
when the connection
is created. I think a mechanism like this is for instance used to select
a free port for the vnc
console when the instance is started.


Yes, with different clusters only using the hostnames is definitely
the way to go. I agree that keeping the information in nova's db may
not be the best idea. It is handy to allow nova to use different
clusters from cinder, so I'd prefer not generating the connection info
locally. The qos_specs are also part of connection_info, and if changed
they would have a similar problem of not applying the new value to
existing instances, even after reboot. Maybe nova should simply refresh
the connection info each time it uses a volume.

Josh



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

2015-05-11 Thread Josh Durgin

On 05/08/2015 12:41 AM, Arne Wiebalck wrote:

Hi Josh,

In our case adding the monitor hostnames (alias) would have made only a
slight difference:
as we moved the servers to another cluster, the client received an
authorisation failure rather
than a connection failure and did not try to fail over to the next IP in
the list. So, adding the
alias to list would have improved the chances to hit a good monitor, but
it would not have
eliminated the problem.


Could you provide more details on the procedure you followed to move
between clusters? I missed the separate clusters part initially, and
thought you were simply replacing the monitor nodes.


I’m not sure storing IPs in the nova database is a good idea in gerenal.
Replacing (not adding)
these by the hostnames is probably better. Another approach may be to
generate this part of
connection_info (and hence the XML) dynamically from the local ceph.conf
when the connection
is created. I think a mechanism like this is for instance used to select
a free port for the vnc
console when the instance is started.


Yes, with different clusters only using the hostnames is definitely
the way to go. I agree that keeping the information in nova's db may
not be the best idea. It is handy to allow nova to use different
clusters from cinder, so I'd prefer not generating the connection info
locally. The qos_specs are also part of connection_info, and if changed
they would have a similar problem of not applying the new value to
existing instances, even after reboot. Maybe nova should simply refresh
the connection info each time it uses a volume.

Josh


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] Static Ceph mon connection info prevents VM restart

2015-05-07 Thread Josh Durgin

Hey folks, thanks for filing a bug for this:

https://bugs.launchpad.net/cinder/+bug/1452641

Nova stores the volume connection info in its db, so updating that
would be a workaround to allow restart/migration of vms to work.
Otherwise running vms shouldn't be affected, since they'll notice any
new or deleted monitors through their existing connection to the
monitor cluster.

Perhaps the most general way to fix this would be for cinder to return
any monitor hosts listed in ceph.conf (as they are listed, so they may
be hostnames or ips) in addition to the ips from the current monmap
(the current behavior).

That way an out of date ceph.conf is less likely to cause problems,
and multiple clusters could still be used with the same nova node.

Josh

On 05/06/2015 12:46 PM, David Medberry wrote:

Hi Arne,

We've had this EXACT same issue.

I don't know of a way to force an update as you are basically pulling
the rug out from under a running instance. I don't know if it is
possible/feasible to update the virsh xml in place and then migrate to
get it to actually use that data. (I think we tried that to no avail.)
dumpxml=>massage cephmons=>import xml

If you find a way, let me know, and that's part of the reason I'm
replying so that I stay on this thread. NOTE: We did this on icehouse.
Haven't tried since upgrading to Juno but I don't note any change
therein that would mitigate this. So I'm guessing Liberty/post-Liberty
for a real fix.



On Wed, May 6, 2015 at 12:57 PM, Arne Wiebalck mailto:arne.wieba...@cern.ch>> wrote:

Hi,

As we swapped a fraction of our Ceph mon servers between the
pre-production and production cluster
— something we considered to be transparent as the Ceph config
points to the mon alias—, we ended
up in a situation where VMs with volumes attached were not able to
boot (with a probability that matched
the fraction of the servers moved between the Ceph instances).

We found that the reason for this is the connection_info in
block_device_mapping which contains the
IP adresses of the mon servers as extracted by the rbd driver in
initialize_connection() at the moment
when the connection is established. From what we see, however, this
information is not updated as long
as the connection exists, and will hence be re-applied without
checking even when the XML is recreated.

The idea to extract the mon servers by IP from the mon map was
probably to get all mon servers (rather
than only one from a load-balancer or an alias), but while our
current scenario may be special, we will face
a similar problem the day the Ceph mons need to be replaced. And
that makes it a more general issue.

For our current problem:
Is there a user-transparent way to force an update of that
connection information? (Apart from fiddling
with the database entries, of course.)

For the general issue:
Would it be possible to simply use the information from the
ceph.conf file directly (an alias in our case)
throughout the whole stack to avoid hard-coding IPs that will be
obsolete one day?

Thanks!
  Arne

—
Arne Wiebalck
CERN IT
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [OpenStack-Dev] [libvirt] Block Devices and the "discard" option

2014-10-10 Thread Josh Durgin

On 10/10/2014 08:44 PM, John Griffith wrote:

Hi,

So I've been beating on this for a good part of the day and I'm not
having much luck so I thought I'd ask on the ML if anybody has had any
success with the following.

We have some applications that have been migrated off of our physical
machines and into our OpenStack Cloud.  The trouble is that these apps
and our storage take advantage of fstrim which returns the error:

   "fstrim: /: FITRIM ioctl failed: Operation not supported"

I thought... oh, easy I'll work up a quick patch to add this to Cinder
and Nova; but I don't seem to be having any luck getting this to work.

I also cam across the Nova patch to add this to Local Disk here: [1]
and I seem to get the same error there as well.

Testing fstrim on the devices from the compute node works fine, just not
the device passed in to the instance.

The XML I'm sending looks like this [2]

I'm running on 14.04 with RC1 builds.

I'm not sure what I'm missing, or if anybody has been able to make this
work, or if it should work.  Any insight would be greatly appreciated.

Thanks,
John


[1]: https://review.openstack.org/#/c/112977/12
[2]: https://gist.github.com/j-griffith/3341ad287c5d684f02b5


Hey John,

I'm not sure if it's the only issue, but the virtio bus doesn't support
discard. You need to use virtio-scsi, scsi, or ide.

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support

2014-03-13 Thread Josh Durgin

On 03/13/2014 12:48 PM, Russell Bryant wrote:

On 03/13/2014 03:04 PM, Josh Durgin wrote:

These reverts are still confusing me. The use of glance's v2 api
is very limited and easy to protect from errors.

These patches use the v2 glance api for exactly one call - to get
image locations. This has been available and used by other
features in nova and cinder since 2012.

Jay's patch fixed the one issue that was found, and added tests for
several other cases. No other calls to glance v2 are used. The method
Jay fixed is the only one that accesses the response from glanceclient.
Furthermore, it's trivial to guard against more incompatibilities and
fall back to downloading normally if any errors occur. This already
happens if glance does not expose image locations.


There was some use of the v2 API, but not by default.  These patches
changed that, and it was broken.  We went from not requiring the v2 API
to requiring it, without a complete view for what that means, including
a severe lack of testing of that API.


That's my point - these patches did not need to require the v2 API. They
could easily try it and fall back, or detect when only the default
handler was enabled and not even try the v2 API in that case.

There is no hard requirement on the v2 API.


I think it's the right call to block any non-optional use of the API
until it's properly tested, and ideally, supported more generally in nova.


Can we consider adding this safety valve and un-reverting these patches?


No.  We're already well into the freeze and we can't afford risk or
distraction.  The time it took to deal with and discuss the issue this
caused is exactly why we're hesitant to approve FFEs at all.  It's a
distraction during critical time as we work toward the RC.


FWIW the patch that caused the issue was merged before FF.


The focus right now has to be on high/critical priority bugs and
roegressions.  We can revisit this properly in Juno.


Ok.

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support

2014-03-13 Thread Josh Durgin

On 03/12/2014 04:54 PM, Matt Riedemann wrote:



On 3/12/2014 6:32 PM, Dan Smith wrote:

I'm confused as to why we arrived at the decision to revert the commits
since Jay's patch was accepted. I'd like some details about this
decision, and what new steps we need to take to get this back in for
Juno.


Jay's fix resolved the immediate problem that was reported by the user.
However, after realizing why the bug manifested itself and why it didn't
occur during our testing, all of the core members involved recommended a
revert as the least-risky course of action at this point. If it took
almost no time for that change to break a user that wasn't even using
the feature, we're fearful about what may crop up later.

We talked with the patch author (zhiyan) in IRC for a while after making
the decision to revert about what the path forward for Juno is. The
tl;dr as I recall is:

  1. Full Glance v2 API support merged
  2. Tests in tempest and nova that exercise Glance v2, and the new
 feature
  3. Push the feature patches back in

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Those are essentially the steps as I remember them too.  Sean changed
the dependencies in the blueprints so the nova glance v2 blueprint is
the root dependency, then multiple images and then the other download
handler blueprints at the top.  I haven't checked but the blueprints
should be marked as not complete (not sure what that would be now) and
marked for next, the v2 glance root blueprint should be marked as high
priority too so we get the proper focus when Juno opens up.


These reverts are still confusing me. The use of glance's v2 api
is very limited and easy to protect from errors.

These patches use the v2 glance api for exactly one call - to get
image locations. This has been available and used by other
features in nova and cinder since 2012.

Jay's patch fixed the one issue that was found, and added tests for
several other cases. No other calls to glance v2 are used. The method
Jay fixed is the only one that accesses the response from glanceclient.
Furthermore, it's trivial to guard against more incompatibilities and
fall back to downloading normally if any errors occur. This already
happens if glance does not expose image locations.

Can we consider adding this safety valve and un-reverting these patches?

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] Blueprint cinder-rbd-driver-qos

2014-03-07 Thread Josh Durgin

On 03/05/2014 07:24 AM, git harry wrote:

Hi,

https://blueprints.launchpad.net/cinder/+spec/cinder-rbd-driver-qos

I've been looking at this blueprint with a view to contributing on it, assuming 
I can take it. I am unclear as to whether or not it is still valid. I can see 
that it was registered around a year ago and it appears the functionality is 
essentially already supported by using multiple backends.

Looking at the existing drivers that have qos support it appears IOPS etc are 
available for control/customisation. As I understand it  Ceph has no qos type 
control built-in and creating pools using different hardware is as granular as 
it gets. The two don't quite seem comparable to me so I was hoping to get some 
feedback, as to whether or not this is still useful/appropriate, before 
attempting to do any work.


Ceph does not currently have any qos support, but relies on QEMU's io
throttling, which Cinder and Nova can configure.

There is interest in adding better throttling to Ceph itself though,
since writes from QEMU may be combined before writing to Ceph when
caching is used. There was a session on this at the Ceph developer
summit earlier this week:

https://wiki.ceph.com/Planning/CDS/CDS_Giant_%28Mar_2014%29#rbd_qos

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] FFE Request: Ephemeral RBD image support

2014-03-06 Thread Josh Durgin

On 03/06/2014 05:37 PM, Andrew Woodward wrote:

Matt,

I'd love to see this too, however I'm not seasoned enough to even know
much about how to start implementing that. I'd love some direction,
and maybe some support after you guys are done with the pending
release.


We're working on setting up CI with Ceph starting with Cinder.
Jay Pipes' recent blog posts explaining this process are great:

http://www.joinfu.com/2014/02/setting-up-an-external-openstack-testing-system/

Josh


As others have illustrated here, the current RBD support in nova is
effectively useless and I'd love to see that second sponsor so us Ceph
users don't have to use hand patched nova for another release.

On Thu, Mar 6, 2014 at 3:30 PM, Matt Riedemann
 wrote:



On 3/6/2014 2:20 AM, Andrew Woodward wrote:


I'd Like to request A FFE for the remaining patches in the Ephemeral
RBD image support chain

https://review.openstack.org/#/c/59148/
https://review.openstack.org/#/c/59149/

are still open after their dependency
https://review.openstack.org/#/c/33409/ was merged.

These should be low risk as:
1. We have been testing with this code in place.
2. It's nearly all contained within the RBD driver.

This is needed as it implements an essential functionality that has
been missing in the RBD driver and this will become the second release
it's been attempted to be merged into.

Andrew
Mirantis
Ceph Community

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



What would be awesome in Juno is some CI around RBD/Ceph.  I'd feel a lot
more comfortable with this code if we had CI running Tempest against that
type of configuration, just like how we are now requiring 3rd party CI for
virt drivers.

I realize this is tangential but it would make moving these blueprints
through faster so you're not working on it over multiple releases.

Having said that, I'm not signing up for sponsoring this, sorry. :)

--

Thanks,

Matt Riedemann



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Ceph and OpenStack unconference on Friday

2013-11-04 Thread Josh Durgin

We're having an unconference session at the OpenStack design summit
on Friday at 2:20pm.

If you're interested in Ceph integration with OpenStack, please join us
to discuss future development.

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Cinder] Propose Jay Bryant for core

2013-10-29 Thread Josh Durgin

On 10/29/2013 01:54 PM, John Griffith wrote:

Hey,

I wanted to propose Jay Bryant (AKA jsbryant, AKA jungleboy, AKA
:) ) for core membership on the Cinder team.  Jay has been working on
Cinder for a while now and has really shown some dedication and
provided much needed help with quality reviews.  In addition to his
review activity he's also been very active in IRC and in Cinder
development as well.

I think he'd be a good add to the core team.


+1

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FFE Request: Make RBD Usable for Ephemeral Storage

2013-09-18 Thread Josh Durgin

On 09/18/2013 07:14 AM, Thierry Carrez wrote:

Mike Perez wrote:

Currently in Havana development, RBD as ephemeral storage has serious
stability
and performance issues that makes the Ceph cluster a bottleneck for using an
image as a source.
[...]


This comes up a bit late, and the current RC bugs curves[1] really do
not encourage me to add more distraction for core reviewers.

The only way I could be fine with this would be for the performance
issue to actually be considered a bug (being so slow you can't really
use it without the fix), *and* the review being very advanced and
consensual that the distraction is minimal.

Could you quantify the performance issue, and address Zhi Yan Liu's
comments ?


In terms of performance, consider an image that's already cached on
the hypervisor host. Using the libvirt_images_type=rbd, the locally
cached copy is imported via the 'rbd import' command. This is a very
slow operation, which ended up taking upwards of 15 minutes in my
tests.

Using libvirt_images_type=qcow2, this takes roughly 0.2 seconds.

If the image was not already cached locally, it would have to be
downloaded from glance, which would take the same amount of time
regardless of libvirt_images_type or glance backend.

The proposed patch avoids downloading from glance and uploading to
rbd, and results in similar times to qcow2 without requiring the image
to be cached locally.

Josh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FFE Request: Make RBD Usable for Ephemeral Storage

2013-09-18 Thread Josh Durgin

On 09/17/2013 07:33 PM, Zhi Yan Liu wrote:

On Wed, Sep 18, 2013 at 6:16 AM, Mike Perez  wrote:

Folks,

Currently in Havana development, RBD as ephemeral storage has serious
stability
and performance issues that makes the Ceph cluster a bottleneck for using an
image as a source.

Nova has to currently communicate with the external service Glance, which
has
to talk to the separate Ceph storage backend to fetch path information. The
entire image is then downloaded to local disk, and then imported from local
disk to RBD. This leaves a stability concern, especially with large images
for
the instance to be successfully created, due to unnecessary data pulling and
pushing for solutions like RBD.

Due to the fact we have to do a import from local disk to RBD, this can make
performance even slower than a normal backend filesystem since the import is
single threaded.

This can be eliminated by instead having Nova's RBD image backend utility
communicate directly with the Ceph backend to do a copy-on-write of the
image.
Not only does this greatly improve stability, but performance is drastically
improved by not having to do a full copy of the image. A lot of the code to
make this happen came from the RBD Cinder driver which has been stable and
merged for quite a while.

Bug: https://code.launchpad.net/bugs/1226351
Patch: https://review.openstack.org/#/c/46879/1

Thanks,
Mike Perez

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




Hi Mike Perez, folks,

I absolutely agree use zero-copy approach to prepare template image is
a good idea, such as CoW. But after check your patch I have some
concerns on the currently implementation.

Actually I had prepared some dedicated BPs [1][2] and a patch [3] to
cover such requirements and problems around zero-copy (aka your
'direct_fetch') image preparing, it been implemented as a framework
and allow other people realize such plug-ins for a particular image
storage backend/location. So I'd very like to invite you (and Josh
Durgin) to take a look on them, I believe (and welcome) your stuff
within #46879 around RBD image handling can be implemented as a
RBDImangeHandler plug-ins under my framework.


I like the plugin framework you've created there, but it was not
merged for Havana, which is why I did not use it in this patch. I
didn't want to include more new code than necessary. I simply used the
same approach that cinder uses to minimize the risk of regression.


I consider above implementation is better, since framework code within
#33409 can handle most common logic such as plug-ins loading, image
handler selecting base on image location, image multiple location
supporting and etc.. And each particular image handler can just
implement such special methods easily and don't need to rebuild the
existed (and tested) part.


I'm fine with that approach, and the direct fetch method in my patch
can be easily refactored into an RBDImageHandler like you suggest with
a little bit more information passed to the image handlers. I'm happy
to help with this for Icehouse.

Thanks!
Josh


Of cause, with the production of new handlers we probably need add
more interfaces and pass more context data to the structure /
ImageHandler base class as needed,  we can talk this in irc.

[1] https://blueprints.launchpad.net/nova/+spec/image-multiple-location
[2] 
https://blueprints.launchpad.net/nova/+spec/effective-template-base-image-preparing
[3] https://review.openstack.org/#/c/33409/

thanks,
zhiyan




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev