Re: [openstack-dev] [cinder] No middle-man - when does/will Nova directly connect iSCSI volumes?
On 06/23/2016 Daniel Berrange wrote (lost attribution in thread): > Our long term goal is that 100% of all network storage will be connected > to directly by QEMU. We already have the ability to partially do this with > iSCSI, but it is lacking support for multipath. As & when that gap is > addressed though, we'll stop using the host OS for any iSCSI stuff. > > So if you're requiring access to host iSCSI volumes, it'll work in the > short-medium term, but in the medium-long term we're not going to use > that so plan accordingly. > On 06/23/2016 10:09 AM, Walter A. Boring IV wrote: > We regularly fix issues with iSCSI attaches in the release cycles of > OpenStack, > because it's all done in python using existing linux packages. How often > are QEMU > releases done and upgraded on customer deployments vs. python packages > (os-brick)? > > I don't see a compelling reason for re-implementing the wheel, > and it seems like a major step backwards. > On Thu, Jun 23, 2016 at 12:07:43PM -0600, Chris Friesen wrote: > This is an interesting point. > > Unless there's a significant performance benefit to connecting > directly from qemu, it seems to me that we would want to leverage > the existing work done by the kernel and other "standard" iSCSI > initators. > On Thu, Jun 23, 2016 at 1:28 PM, Sean McGinniswrote: > > I'm curious to find out this as well. Is this for a performance gain? If > so, do we have any metrics showing that gain is significant enough to > warrant making a change like this? > > The host OS is still going to be involved. AFAIK, this just cuts out the > software iSCSI initiator from the picture. So we would be moving from a > piece of software dedicated to one specific functionality, to a > different piece of software that's main reason for existence is nothing > to do with IO path management. > > I'm not saying I'm completely opposed to this. If there is a reason for > doing it then it could be worth it. But so far I haven't seen anything > explaining why this would be better than what we have today. First, I have not taken any measurements, so please ignore everything I say. :) Very generally, if you take out unnecessary layers, you can often improve performance and reliability. Not in every case, but often. Volume connections routed through the Linux kernel *might* lose performance from the extra layer (measures are needed), and have to be managed. The last could be easily underestimated. Nova has to manage Linux's knowledge of volume connections. In the strictest sense the nova-compute host Linux does not *need* to know about volumes attached to Nova instances. The hairiest part of the the problem: What to do when the nova-compute Linux table of attached volumes gets out of sync? My guess is there are error cases not-yet well-handled in Nova in this area. My guess is Nova could be somewhat simpler if all volumes were directly attached to QEMU. (Bit cheating when mentioning the out-of-sync case, as got bit a couple times in testing. It happens.) But ... as mentioned earlier, suspect you cannot get to 100% direct to QEMU if there is specialized hardware that has to tie into the nova-compute Linux. Seems unlikely you would get consensus, as this impacts major vendors. Which means you have to keep managing the host map of volumes. Which means you cannot simplify Nova. (If someone knows how to use the specialized hardware, with less footprint in host Linux, this answer could change.) Where this will land, I do not know. Do not know the performance measures. Can OpenStack allow for specialized hardware, without routing through host Linux? (Probably not, but would be happy to be wrong.) And again, as an outsider, I could be wrong about everything. :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] No middle-man - when does/will Nova directly connect iSCSI volumes?
Daniel, Thanks. Looking for a sense of direction. Clearly there is some range of opinion, as Walter indicates. :) Not sure you are get to 100% direct connection to QEMU. When there is dedicated hardware to do off-board processing of the connection to storage, you might(?) be stuck routing through the nova-compute host Linux. (I am not an expert in this area, so I could easily be wrong.) This sort of hardware tends to be associated with higher-end "enterprise" storage and hosts (and higher cost). The storage guys are in the habit of calling these "HBA adaptors" (for high-bandwidth) - might just be a local thing. I *suspect* that higher cost will drive most cloud deployments away from that sort of specialized hardware. In which case the direct-connect to QEMU model should (mostly?) work. (My non-expert guess.) On Thu, Jun 23, 2016 at 9:09 AM, Walter A. Boring IVwrote: > > volumes connected to QEMU instances eventually become directly connected? > > Our long term goal is that 100% of all network storage will be connected >> to directly by QEMU. We already have the ability to partially do this with >> iSCSI, but it is lacking support for multipath. As & when that gap is >> addressed though, we'll stop using the host OS for any iSCSI stuff. >> >> So if you're requiring access to host iSCSI volumes, it'll work in the >> short-medium term, but in the medium-long term we're not going to use >> that so plan accordingly. >> > > What is the benefit of this largely monolithic approach? It seems that > moving everything into QEMU is diametrically opposed to the unix model > itself and > is just a re-implementation of what already exists in the linux world > outside of QEMU. > > Does QEMU support hardware initiators? iSER? > > We regularly fix issues with iSCSI attaches in the release cycles of > OpenStack, > because it's all done in python using existing linux packages. How often > are QEMU > releases done and upgraded on customer deployments vs. python packages > (os-brick)? > > I don't see a compelling reason for re-implementing the wheel, > and it seems like a major step backwards. > > >> Xiao's unanswered query (below) presents another question. Is this a >>> site-choice? Could I require my customers to configure their OpenStack >>> clouds to always route iSCSI connections through the nova-compute host? >>> (I >>> am not a fan of this approach, but I have to ask.) >>> >> > In the short term that'll work, but long term we're not intending to >> support that once QEMU gains multi-path. There's no timeframe on when >> that will happen though. >> >> >> Regards, >> Daniel >> > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] consistency and exposing quiesce in the Nova API
Comments inline. On Thu, Jun 16, 2016 at 10:13 AM, Matt Riedemann <mrie...@linux.vnet.ibm.com > wrote: > On 6/16/2016 6:12 AM, Preston L. Bannister wrote: > >> I am hoping support for instance quiesce in the Nova API makes it into >> OpenStack. To my understanding, this is existing function in Nova, just >> not-yet exposed in the public API. (I believe Cinder uses this via a >> private Nova API.) >> > > I'm assuming you're thinking of the os-assisted-volume-snapshots admin API > in Nova that is called from the Cinder RemoteFSSnapDrivers (glusterfs, > scality, virtuozzo and quobyte). I started a separate thread about that > yesterday, mainly around the lack of CI testing / status so we even have an > idea if this is working consistently and we don't regress it. Yes, I believe we are talking about the same thing. Also, I saw your other message. :) Much of the discussion is around disaster recovery (DR) and NFV - which >> is not wrong, but might be muddling the discussion? Forget DR and NFV, >> for the moment. >> >> My interest is simply in collecting high quality backups of applications >> (instances) running in OpenStack. (Yes, customers are deploying >> applications into OpenStack that need backup - and at large scale. They >> told us, *very* clearly.) Ideally, I would like to give the application >> a chance to properly quiesce, so the on-disk state is most-consistent, >> before collecting the backup. >> > > We already attempt to quiesce an active volume-backed instance before > doing a volume snapshot: > > > https://github.com/openstack/nova/blob/11bd0052bdd660b63ecca53c5b6fe68f81bdf9c3/nova/compute/api.py#L2266 > The problem is, from my point of view, if the instance has more than one volume (and many do), then quiescing the instance for more than once is not very nice. > The existing function in Nova should be at least a good start, it just >> needs to be exposed in the public Nova API. (At least, this is my >> understanding.) >> >> Of course, good backups (however collected) allow you to build DR >> solutions. My immediate interest is simply to collect high-quality >> backups. >> >> The part in the blueprint about an atomic operation on a list of >> instances ... this might be over-doing things. First, if you have a set >> of related instances, very likely there is a logical order in which they >> should be quiesced. Some could be quiesced concurrently. Others might >> need to be sequential. >> >> Assuming the quiesce API *starts* the operation, and there is some means >> to check for completion, then a single-instance quiesce API should be >> sufficient. An API that is synchronous (waits for completion before >> returning) would also be usable. (I am not picky - just want to collect >> better backups for customers.) >> > > As noted above, we already attempt to quiesce when doing a volume-backed > instance snapshot. > > The problem comes in with the chaining and orchestration around a list of > instances. That requires additional state management and overhead within > Nova and while we're actively trying to redo parts of the code base to make > things less terrible, adding more complexity on top at the same time > doesn't help. > I agree with your concern. To be clear, what I am hoping for is the simplest possible version - a API to quiesce/unquiesce a single instance, similar to the existing pause/unpause APIs. Handling of lists of instances (and response to state changes), I would expect implement on the caller-side. There are application-specific semantics, so a single-instance API has merit from my perspective. > I'm also not sure what something like multiattach volumes will throw into > the mix with this, but that's another DR/HA requirement. > > So I get that lots of people want lots of things that aren't in Nova right > now. We have that coming from several different projects (cinder for > multiattach volumes, neutron for vlan-aware-vms and routed networks), and > several different groups (NFV, ops). > > We also have a lot of people that just want the basic IaaS layer to work > for the compute service in an OpenStack cloud, like being able to scale > that out better and track resource usage for accurate scheduling. > > And we have a lot of developers that want to be able to actually > understand what it is the code is doing, and a much smaller number of core > maintainers / reviewers that don't want to have to keep piling technical > debt into the project while we're trying to fix some of what's already > built up over the years - and actually have this stuff backed with > integration testing. > > So, I get it. We all have
Re: [openstack-dev] [nova] consistency and exposing quiesce in the Nova API
I am hoping support for instance quiesce in the Nova API makes it into OpenStack. To my understanding, this is existing function in Nova, just not-yet exposed in the public API. (I believe Cinder uses this via a private Nova API.) Much of the discussion is around disaster recovery (DR) and NFV - which is not wrong, but might be muddling the discussion? Forget DR and NFV, for the moment. My interest is simply in collecting high quality backups of applications (instances) running in OpenStack. (Yes, customers are deploying applications into OpenStack that need backup - and at large scale. They told us, *very* clearly.) Ideally, I would like to give the application a chance to properly quiesce, so the on-disk state is most-consistent, before collecting the backup. The existing function in Nova should be at least a good start, it just needs to be exposed in the public Nova API. (At least, this is my understanding.) Of course, good backups (however collected) allow you to build DR solutions. My immediate interest is simply to collect high-quality backups. The part in the blueprint about an atomic operation on a list of instances ... this might be over-doing things. First, if you have a set of related instances, very likely there is a logical order in which they should be quiesced. Some could be quiesced concurrently. Others might need to be sequential. Assuming the quiesce API *starts* the operation, and there is some means to check for completion, then a single-instance quiesce API should be sufficient. An API that is synchronous (waits for completion before returning) would also be usable. (I am not picky - just want to collect better backups for customers.) On Sun, May 29, 2016 at 7:24 PM, joehuangwrote: > Hello, > > This spec[1] was to expose quiesce/unquiesce API, which had been approved > in Mitaka, but code not merged in time. > > The major consideration for this spec is to enable application level > consistency snapshot, so that the backup of the snapshot in the remote site > could be recovered correctly in case of disaster recovery. Currently there > is only single VM level consistency snapshot( through create image from VM > ), but it's not enough. > > First, the disaster recovery is mainly the action in the infrastructure > level in case of catastrophic failures (flood, earthquake, propagating > software fault), the cloud service provider recover the infrastructure and > the applications without the help from each application owner: you can not > just recover the OpenStack, then send notification to all applications' > owners, to ask them to restore their applications by their own. As the > cloud service provider, they should be responsible for the infrastructure > and application recovery in case of disaster. > > The second, this requirement is not to make OpenStack bend over NFV, > although this requirement was asked from OPNFV at first, it's general > requirement to have application level consistency snapshot. For example, > just using OpenStack itself as the application running in the cloud, we can > deploy different DB for different service, i.e. Nova has its own mysql > server nova-db-VM, Neutron has its own mysql server neutron-db-VM. In fact, > I have seen in some production to divide the db for Nova/Cinder/Neutron to > different DB server for scalability purpose. We know that there are > interaction between Nova and Neutron when booting a new VM, during the VM > booting period, some data will be in the memory cache of the > nova-db-VM/neutron-db-VM, if we just create snapshot of the volumes of > nova-db-VM/neutron-db-VM in Cinder, the data which has not been flushed to > the disk will not be in the snapshot of the volumes. We cann't make sure > when these data in the memory cache will be flushed, then > there is random possibility that the data in the snapshot is not > consistent as what happened as in the virtual machines of > nova-db-VM/neutron-db-VM.In this case, Nova/Neutron may boot in the > disaster recovery site successfully, but some port information may be > crushed for not flushed into the neutron-db-VM when doing snapshot, and in > the severe situation, even the VM may not be able to recover successfully > to run. Although there is one project called Dragon[2], Dragon can't > guarantee the consistency of the application snapshot too through OpenStack > API. > > The third, for those applications which can decide the data and checkpoint > should be replicated to disaster recovery site, this is the third option > discussed and described in our analysis: > https://git.opnfv.org/cgit/multisite/tree/docs/requirements/multisite-vnf-gr-requirement.rst. > But unfortunately in Cinder, after the volume replication V2.1 is > developed, the tenant granularity volume replication is still being > discussed, and still not on single volume level. And just like what have > mentioned in the first point, both application level and infrastructure > level are
Re: [openstack-dev] [cinder] No middle-man - when does/will Nova directly connect iSCSI volumes?
QEMU has the ability to directly connect to iSCSI volumes. Running the iSCSI connections through the nova-compute host *seems* somewhat inefficient. There is a spec/blueprint and implementation that landed in Kilo: https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/qemu-built-in-iscsi-initiator.html https://blueprints.launchpad.net/nova/+spec/qemu-built-in-iscsi-initiator >From looking at the OpenStack Nova sources ... I am not entirely clear on when this behavior is invoked (just for Ceph?), and how it might change in future. Looking for a general sense where this is headed. (If anyone knows...) If there is some problem with QEMU and directly attached iSCSI volumes, that would explain why this is not the default. Or is this simple inertia? I have a concrete concern. I work for a company (EMC) that offers backup products, and we now have backup for instances in OpenStack. To make this efficient, we need to collect changed-block information from instances. 1) We could put an intercept in the Linux kernel of the nova-compute host to track writes at the block layer. This has the merit of working for containers, and potentially bare-metal instance deployments. But is not guaranteed for instances, if the iSCSI volumes are directly attached to QEMU. 2) We could use the QEMU support for incremental backup (first bit landed in QEMU 2.4). This has the merit of working with any storage, by only for virtual machines under QEMU. As our customers are (so far) only asking about virtual machine backup. I long ago settled on (2) as most promising. What I cannot clearly determine is where (1) will fail. Will all iSCSI volumes connected to QEMU instances eventually become directly connected? Xiao's unanswered query (below) presents another question. Is this a site-choice? Could I require my customers to configure their OpenStack clouds to always route iSCSI connections through the nova-compute host? (I am not a fan of this approach, but I have to ask.) To answer Xiao's question, can a site configure their cloud to *always* directly connect iSCSI volumes to QEMU? On Tue, Feb 16, 2016 at 4:54 AM, Xiao Ma (xima2)wrote: > Hi, All > > I want to make the qemu communicate with iscsi target using libiscsi > directly, and I > followed https://review.openstack.org/#/c/135854/ to add > 'volume_drivers = iscsi=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’ > in nova.conf > and then restarted nova services and cinder services, but still the > volume configuration of vm is as bellow: > > > >dev='/dev/disk/by-path/ip-10.75.195.205:3260-iscsi-iqn.2010-10.org.openstack:volume-076bb429-67fd-4c0c-9ddf-0dc7621a975a-lun-0'/> > > 076bb429-67fd-4c0c-9ddf-0dc7621a975a >function='0x0'/> > > > > I use centos7 and Liberty version of OpenStack. > Could anybody tell me how can I achieve it? > > > Thanks. > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] Limits on volume read throughput?
In my case Centos 7 is using QEMU 1.5.3 ... which is *ancient*. This is on a node with a packstack install of OpenStack. If you have a different result, I would like to know why... Got a bit further in my reading and testing. Also got my raw volume read performance in an instance from ~300MB/s (with some tweaking) up to the current ~800MB/s. Given the host raw volume read rate is ~1.2GB/s, and there are substantial improvements in the software stack (Linux, iSCSI, QEMU) in later versions ... this is a good result. Found the main bottleneck was the iSCSI target in the physical Linux host. (Not in my current test case, in QEMU.) From online presentations on QEMU/iSCSI/Linux, there are known large improvements in more recent versions. Punt. Need to re-test on top of Ubuntu Trusty LTS (what most customers seem headed toward). Will re-base my testing, at some point. My testing (for simplicity) is on an all-in-one node. Curious what other folk are getting with very-fast iSCSI targets. What is the upper range? On Mon, Mar 7, 2016 at 7:59 AM, Chris Friesen <chris.frie...@windriver.com> wrote: > Just a heads-up that the 3.10 kernel in CentOS/RHEL is *not* a stock 3.10 > kernel. It has had many things backported from later kernels, though they > may not have backported the specific improvements you're looking for. > > I think CentOS is using qemu 2.3, which is pretty new. Not sure how new > their libiscsi is though. > > Chris > > On 03/07/2016 12:25 AM, Preston L. Bannister wrote: > >> Should add that the physical host of the moment is Centos 7 with a >> packstack >> install of OpenStack. The instance is Ubuntu Trusty. Centos 7 has a >> relatively >> old 3.10 Linux kernel. >> >> From the last week (or so) of digging, I found there were substantial >> claimed >> improvements in /both/ flash support in Linux and the block I/O path in >> QEMU - >> in more recent versions. How much that impacts the current measures, I do >> not >> (yet) know. >> >> Which suggests a bit of tension. Redhat folk are behind much of these >> improvements, but RHEL (and Centos) are rather far behind. Existing RHEL >> customers want and need careful, conservative changes. Folk deploying >> OpenStack >> need more aggressive release content, for which Ubuntu is currently the >> best base. >> >> Will we see a "Redhat Cloud Base" as an offering with RHEL support >> levels, and >> more aggressive QEMU and Linux kernel inclusion? >> >> At least for now, building OpenStack clouds on Ubuntu might be a much >> better bet. >> >> >> Are those claimed improvements in QEMU and the Linux kernel going to make >> a >> difference in my measured result? I do not know. Still reading, building >> tests, >> and collecting measures... >> >> >> >> >> On Thu, Mar 3, 2016 at 11:28 AM, Chris Friesen < >> chris.frie...@windriver.com >> <mailto:chris.frie...@windriver.com>> wrote: >> >> On 03/03/2016 01:13 PM, Preston L. Bannister wrote: >> >> > Scanning the same volume from within the instance still >> gets the same >> > ~450MB/s that I saw before. >> >> Hmmm, with iSCSI inbetween that could be the TCP memcpy >> limitation. >> >> >> Measuring iSCSI in isolation is next on my list. Both on the >> physical >> host, and >> in the instance. (Now to find that link to the iSCSI test, >> again...) >> >> >> Based on earlier comments it appears that you're using the qemu >> built-in >> iSCSI initiator. >> >> Assuming that's the case, maybe it would make sense to do a test run >> with >> the in-kernel iSCSI code and take qemu out of the picture? >> >> Chris >> >> >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] Limits on volume read throughput?
Should add that the physical host of the moment is Centos 7 with a packstack install of OpenStack. The instance is Ubuntu Trusty. Centos 7 has a relatively old 3.10 Linux kernel. >From the last week (or so) of digging, I found there were substantial claimed improvements in *both* flash support in Linux and the block I/O path in QEMU - in more recent versions. How much that impacts the current measures, I do not (yet) know. Which suggests a bit of tension. Redhat folk are behind much of these improvements, but RHEL (and Centos) are rather far behind. Existing RHEL customers want and need careful, conservative changes. Folk deploying OpenStack need more aggressive release content, for which Ubuntu is currently the best base. Will we see a "Redhat Cloud Base" as an offering with RHEL support levels, and more aggressive QEMU and Linux kernel inclusion? At least for now, building OpenStack clouds on Ubuntu might be a much better bet. Are those claimed improvements in QEMU and the Linux kernel going to make a difference in my measured result? I do not know. Still reading, building tests, and collecting measures... On Thu, Mar 3, 2016 at 11:28 AM, Chris Friesen <chris.frie...@windriver.com> wrote: > On 03/03/2016 01:13 PM, Preston L. Bannister wrote: > > > Scanning the same volume from within the instance still gets the same >> > ~450MB/s that I saw before. >> >> Hmmm, with iSCSI inbetween that could be the TCP memcpy limitation. >> >> >> Measuring iSCSI in isolation is next on my list. Both on the physical >> host, and >> in the instance. (Now to find that link to the iSCSI test, again...) >> > > Based on earlier comments it appears that you're using the qemu built-in > iSCSI initiator. > > Assuming that's the case, maybe it would make sense to do a test run with > the in-kernel iSCSI code and take qemu out of the picture? > > Chris > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] Limits on volume read throughput?
Note that my end goal is to benchmark an application that runs in an instance that does primarily large sequential full-volume-reads. On this path I ran into unexpectedly poor performance within the instance. If this is a common characteristic of OpenStack, then this becomes a question of concern to OpenStack developers. Until recently, ~450MB/s would be (and still is for many cases) *outstanding* throughput. Most similar questions on the web are happy with saturating a couple of gigabit links, or a few spinning disks. So that few(?) folk (to now) have asked questions at this level of performance ... is not a surprise. But with flash displacing spinning disks, much higher throughput is possible. If there is an unnecessary bottleneck, this might be a good time to call attention. >From general notions to current specifics... :) On Wed, Mar 2, 2016 at 10:10 PM, Philipp Marekwrote: > > The benchmark scripts are in: > > https://github.com/pbannister/openstack-bootstrap > in case that might help, here are a few notes and hints about doing > benchmarks for the DRDB block device driver: > > http://blogs.linbit.com/p/897/benchmarking-drbd/ > > Perhaps there's something interesting for you. > Found this earlier. :) > Found that if I repeatedly scanned the same 8GB volume from the physical > > host (with 1/4TB of memory), the entire volume was cached in (host) > memory > > (very fast scan times). > > If the iSCSI target (or QEMU, for direct access) is set up to use buffer > cache, yes. > Whether you really want that is up to discussion - it might be much more > beneficial to move that RAM from the Hypervisor to the VM, which should > then be able to do more efficient caching of the filesystem contents that > it should operate on. > You are right, but my aim was a bit different. Doing a bit of divide-and-conquer. In essence, this test was to see if reducing the host-side volume-read time to (practically) zero would have *any* impact on performance. Given the *huge* introduced latency (somewhere), I did not expect a notable difference - and that it what the measure shows. This further supports the theory that host-side Linux is *not* the issue. > Scanning the same volume from within the instance still gets the same > > ~450MB/s that I saw before. > > Hmmm, with iSCSI inbetween that could be the TCP memcpy limitation. Measuring iSCSI in isolation is next on my list. Both on the physical host, and in the instance. (Now to find that link to the iSCSI test, again...) > > The "iostat" numbers from the instance show ~44 %iowait, and ~50 %idle. > > (Which to my reading might explain the ~50% loss of performance.) Why so > > much idle/latency? > > > > The in-instance "dd" CPU use is ~12%. (Not very interesting.) > > Because your "dd" testcase will be single-threaded, io-depth 1. > And that means synchronous access, each IO has to wait for the preceeding > one to finish... > Given the Linux kernel read-ahead parameter has a noticeable impact on performance, I believe that "dd" does not need wait (much?) for I/O. Note also the large difference between host and instance with "dd". > > Not sure from where the (apparent) latency comes. The host iSCSI target? > > The QEMU iSCSI initiator? Onwards... > > Thread scheduling, inter-CPU cache trashing (if the iSCSI target is on > a different physical CPU package/socket than the VM), ... > > Benchmarking is a dark art. > This physical host has an absurd number of CPUs (at 40), so what you mention is possible. At these high rates, if only losing 10-20% of the throughput, I might consider such causes. But losing 60% ... my guess ... the cause is much less esoteric. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder] Limits on volume read throughput?
First, my degree from school is in Physics. So I know something about designing experiments. :) The benchmark scripts runs "dd" 218 times, against different volumes (of differing sizes), with differing "bs". Measures are collected both from the physical host, and from the within the instance. Linux is told to drop caches before the start. The benchmark scripts are in: https://github.com/pbannister/openstack-bootstrap (Very much a work in progress! Not complete or properly documented.) Second, went through the exercise of collecting hints from the web as to parameters for tuning iSCSI performance. (I did not expect changing Linux TCP parameters to change the result for iSCSI over loopback, but measured to be certain.) Followed all the hints, with no change in performance (as expected). Found that if I repeatedly scanned the same 8GB volume from the physical host (with 1/4TB of memory), the entire volume was cached in (host) memory (very fast scan times). Scanning the same volume from within the instance still gets the same ~450MB/s that I saw before. The difference is that "iostat" in the host is ~93% idle. In the host, *iscsi_ttx* is using ~58% of a CPU (sound high?), and *qemu-kvm* is using ~30% of a CPU. (The physical host is a fairly new box - with 40(!) CPUs.) The "iostat" numbers from the instance show ~44 %iowait, and ~50 %idle. (Which to my reading might explain the ~50% loss of performance.) Why so much idle/latency? The in-instance "dd" CPU use is ~12%. (Not very interesting.) Not sure from where the (apparent) latency comes. The host iSCSI target? The QEMU iSCSI initiator? Onwards... On Tue, Mar 1, 2016 at 5:13 PM, Rick Jones <rick.jon...@hpe.com> wrote: > On 03/01/2016 04:29 PM, Preston L. Bannister wrote: > > Running "dd" in the physical host against the Cinder-allocated volumes >> nets ~1.2GB/s (roughly in line with expectations for the striped flash >> volume). >> >> Running "dd" in an instance against the same volume (now attached to the >> instance) got ~300MB/s, which was pathetic. (I was expecting 80-90% of >> the raw host volume numbers, or better.) Upping read-ahead in the >> instance via "hdparm" boosted throughput to ~450MB/s. Much better, but >> still sad. >> >> In the second measure the volume data passes through iSCSI and then the >> QEMU hypervisor. I expected to lose some performance, but not more than >> half! >> >> Note that as this is an all-in-one OpenStack node, iSCSI is strictly >> local and not crossing a network. (I did not want network latency or >> throughput to be a concern with this first measure.) >> > > Well, not crossing a physical network :) You will be however likely > crossing the loopback network on the node. > Well ... yes. I suspect the latency and bandwidth numbers for loopback are rather better. :) For the purposes of this experiment, I wanted to eliminate the physical network limits as a consideration. What sort of per-CPU utilizations do you see when running the test to the > instance? Also, out of curiosity, what block size are you using in dd? I > wonder how well that "maps" to what iSCSI will be doing. > First, this measure was collected via a script that tried a moderately exhaustive number of variations. Yes, I had the same question. Kernel host read-ahead is 6MB (automatically set). Did not see notable gain past "bs=2M". (Was expecting a bigger gain for larger reads, but not what measures showed.) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][cinder] Limits on volume read throughput?
I have need to benchmark volume-read performance of an application running in an instance, assuming extremely fast storage. To simulate fast storage, I have an AIO install of OpenStack, with local flash disks. Cinder LVM volumes are striped across three flash drives (what I have in the present setup). Since I am only interested in sequential-read performance, the "dd" utility is sufficient as a measure. Running "dd" in the physical host against the Cinder-allocated volumes nets ~1.2GB/s (roughly in line with expectations for the striped flash volume). Running "dd" in an instance against the same volume (now attached to the instance) got ~300MB/s, which was pathetic. (I was expecting 80-90% of the raw host volume numbers, or better.) Upping read-ahead in the instance via "hdparm" boosted throughput to ~450MB/s. Much better, but still sad. In the second measure the volume data passes through iSCSI and then the QEMU hypervisor. I expected to lose some performance, but not more than half! Note that as this is an all-in-one OpenStack node, iSCSI is strictly local and not crossing a network. (I did not want network latency or throughput to be a concern with this first measure.) I do not see any prior mention of performance of this sort on the web or in the mailing list. Possible I missed something. What sort of numbers are you seeing out of high performance storage? Is the huge drop in read-rate within an instance something others have seen? Is the default iSCSI configuration used by Nova and Cinder optimal? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Limits on volume read throughput?
I have need to benchmark volume-read performance of an application running in an instance, assuming extremely fast storage. To simulate fast storage, I have an AIO install of OpenStack, with local flash disks. Cinder LVM volumes are striped across three flash drives (what I have in the present setup). Since I am only interested in sequential-read performance, the "dd" utility is sufficient as a measure. Running "dd" in the physical host against the Cinder-allocated volumes nets ~1.2GB/s (roughly in line with expectations for the striped flash volume). Running "dd" in an instance against the same volume (now attached to the instance) got ~300MB/s, which was pathetic. (I was expecting 80-90% of the raw host volume numbers, or better.) Upping read-ahead in the instance via "hdparm" boosted throughput to ~450MB/s. Much better, but still sad. In the second measure the volume data passes through iSCSI and then the QEMU hypervisor. I expected to lose *some* performance, but not more than half! Note that as this is an all-in-one OpenStack node, iSCSI is strictly local and not crossing a network. (I did not want network latency or throughput to be a concern with this first measure.) I do not see any prior mention of performance of this sort on the web or in the mailing list. Possible I missed something. What sort of numbers are you seeing out of high performance storage? Is the *huge* drop in read-rate within an instance something others have seen? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
On Wed, Feb 3, 2016 at 6:32 AM, Sam Yaplewrote: > [snip] > Full backups are costly in terms of IO, storage, bandwidth and time. A full > backup being required in a backup plan is a big problem for backups when we > talk about volumes that are terabytes large. > As an incidental note... You have to collect full backups, periodically. To do otherwise assumes *absolutely no failures* anywhere in the entire software/hardware stack -- ever -- and no failures in storage over time. (Which collectively is a tad optimistic, at scale.) Whether due to a rare software bug, a marginal piece of hardware, or a stray cosmic ray - an occasional bad block will slip through. More exactly, you need some means of doing occasional full end-to-end verification of stored backups. Periodic full backups are one safeguard. How you go about performing full verification, and how often is a subject for design and optimization. This is where things get a *bit* more complex. :) Or you just accept a higher error rate. (How high depends on the implementation.) And "Yes", multi-terabyte volumes *are* a challenge. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
On a side note, of the folk with interest in this thread, how many are going to the Austin OpenStack conference? Would you be interested in presenting as a panel? I submitted for a presentation on "State of the Art for in-Cloud backup of high-value Applications". Notion is to give context for the folk who need this sort of backup. Something about where we have been, where we are, and what might become possible. I think it would be great to pull in folk from Freezer and Ekko. Jay Pipes seems to like to weigh in on the topic, and could represent Nova? Will gladly add as speakers interested folk! (Of course, the odds of winning a slot are pretty low, but worth a shot.) Any folk who expect to be in Austin, and are interested? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
To be clear, I work for EMC, and we are building a backup product for OpenStack (which at this point is very far along). The primary lack is a good means to efficiently extract changed-block information from OpenStack. About a year ago I worked through the entire Nova/Cinder/libvirt/QEMU stack, to see what was possible. The changes to QEMU (which have been in-flight since 2011) looked most promising, but when they would land was unclear. They are starting to land. This is big news. :) That is not the end of the problem. Unless the QEMU folk are perfect, there are likely bugs to be found when the code is put into production. (With more exercise, the sooner any problems can be identified and addressed.) OpenStack uses libvirt to talk to QEMU, and libvirt is a fairly thick abstraction. Likely there will want to be adjustments to libvirt. Bypassing Nova and chatting with libvirt directly is a bit suspect (but may be needed). There might be adjustments needed in Nova. To offer suggestions... Ekko is an *opinionated* approach to backup. This is not the only way to solve the problem. I happen very much like the approach, but as a *specific *approach, it probably does not belong in Cinder or Nova. (I believe it was Jay who offered a similar argument about backup more generally.) (Keep in mind QEMU is not the only hypervisor supported by Nova, if the majority of use. Would you want to attempt a design that works for all hypervisors? I would not! ...at least at this point. Also, last I checked the Cinder folk were a bit hung up on replication, as finding common abstractions across storage was not easy. This problem looks similar.) While wary of bypassing Nova/Cinder, my suggestion would to be rude in the beginning, with every intent of becoming civil in the end. Start by talking to libvirt directly. (The was a bypass mechanism in libvirt that looked like it might be sufficient.) Break QEMU early, and get it fixed. :) When QEMU usage is working, talk to the libvirt folk about *proven* needs, and what is needed to become civil. When libvirt is updated (or not), talk to Nova folk about *proven* needs, and what is needed to become civil. (Perhaps simply awareness, or a small set of primitives.) It might take quite a while for the latest QEMU and libvirt to ripple through into OpenStack distributions. Getting any fixes into QEMU early (or addressing discovered gaps in needed function) seems like a good thing. All the above is a sufficiently ambitious project, just by itself. To my mind, that justifies Ekko as a unique, focused project. On Mon, Feb 1, 2016 at 4:28 PM, Sam Yaplewrote: > On Mon, Feb 1, 2016 at 10:32 PM, Fausto Marzi > wrote: > >> Hi Preston, >> Thank you. You saw Fabrizio in Vancouver, I'm Fausto, but it's allright, >> : P >> >> The challenge is interesting. If we want to build a dedicated backup API >> service (which is always what we wanted to do), probably we need to: >> >> >>- Place out of Nova and Cinder the backup features, as it wouldn't >>make much sense to me to have a Backup service and also have backups >>managed independently by Nova and Cinder. >> >> >> That said, I'm not a big fan of the following: >> >>- Interacting with the hypervisors and the volumes directly without >>passing through the Nova and Cinder API. >> >> Passing through the api will be a huge issue for extracting data due to > the sheer volume of data needed (TB through the api is going to kill > everything!) > >> >>- Adding any additional workload on the compute nodes or block >>storage nodes. >>- Computing incremental, compression, encryption is expensive. Have >>many simultaneous process doing that may lead to bad behaviours on core >>services. >> >> These are valid concerns, but the alternative is still shipping the raw > data elsewhere to do this work, and that has its own issue in terms of > bandwidth. > >> >> My (flexible) thoughts are: >> >>- The feature is needed and is brilliant. >>- We should probably implement the newest feature provided by the >>hypervisor in Nova and export them from the Nova API. >>- Create a plugin that is integrated with Freezer to leverage that >>new features. >>- Same apply for Cinder. >>- The VMs and Volumes backup feature is already available by Nova, >>Cinder and Freezer. It needs to be improved for sure a lot, but do we need >>to create a new project for a feature that needs to be improved, rather >>than work with the existing Teams? >> >> I disagree with this statement strongly as I have stated before. Nova has > snapshots. Cinder has snapshots (though they do say cinder-backup). Freezer > wraps Nova and Cinder. Snapshots are not backups. They are certainly not > _incremental_ backups. They can have neither compression, nor encryption. > With this in mind, Freezer does not have this "feature" at all. Its not > that it needs improvement, it simply
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
Oh, for the other folk reading, in QEMU you want to look at: http://wiki.qemu.org/Features/IncrementalBackup The above page looks to be current. The QEMU wiki seems to have a number of stale pages that describe proposed function that was abandoned / never implemented. Originally, I ended up reading the QEMU mailing list and source code to figure out which bits were real. :) On Tue, Feb 2, 2016 at 4:04 AM, Preston L. Bannister <pres...@bannister.us> wrote: > To be clear, I work for EMC, and we are building a backup product for > OpenStack (which at this point is very far along). The primary lack is a > good means to efficiently extract changed-block information from OpenStack. > About a year ago I worked through the entire Nova/Cinder/libvirt/QEMU > stack, to see what was possible. The changes to QEMU (which have been > in-flight since 2011) looked most promising, but when they would land was > unclear. They are starting to land. This is big news. :) > > That is not the end of the problem. Unless the QEMU folk are perfect, > there are likely bugs to be found when the code is put into production. > (With more exercise, the sooner any problems can be identified and > addressed.) OpenStack uses libvirt to talk to QEMU, and libvirt is a fairly > thick abstraction. Likely there will want to be adjustments to libvirt. > Bypassing Nova and chatting with libvirt directly is a bit suspect (but may > be needed). There might be adjustments needed in Nova. > > To offer suggestions... > > Ekko is an *opinionated* approach to backup. This is not the only way to > solve the problem. I happen very much like the approach, but as a *specific > *approach, it probably does not belong in Cinder or Nova. (I believe it > was Jay who offered a similar argument about backup more generally.) > > (Keep in mind QEMU is not the only hypervisor supported by Nova, if the > majority of use. Would you want to attempt a design that works for all > hypervisors? I would not! ...at least at this point. Also, last I checked > the Cinder folk were a bit hung up on replication, as finding common > abstractions across storage was not easy. This problem looks similar.) > > While wary of bypassing Nova/Cinder, my suggestion would to be rude in the > beginning, with every intent of becoming civil in the end. > > Start by talking to libvirt directly. (The was a bypass mechanism in > libvirt that looked like it might be sufficient.) Break QEMU early, and get > it fixed. :) > > When QEMU usage is working, talk to the libvirt folk about *proven* > needs, and what is needed to become civil. > > When libvirt is updated (or not), talk to Nova folk about *proven* needs, > and what is needed to become civil. (Perhaps simply awareness, or a small > set of primitives.) > > It might take quite a while for the latest QEMU and libvirt to ripple > through into OpenStack distributions. Getting any fixes into QEMU early (or > addressing discovered gaps in needed function) seems like a good thing. > > All the above is a sufficiently ambitious project, just by itself. To my > mind, that justifies Ekko as a unique, focused project. > > > > > > > On Mon, Feb 1, 2016 at 4:28 PM, Sam Yaple <sam...@yaple.net> wrote: > >> On Mon, Feb 1, 2016 at 10:32 PM, Fausto Marzi <fausto.ma...@gmail.com> >> wrote: >> >>> Hi Preston, >>> Thank you. You saw Fabrizio in Vancouver, I'm Fausto, but it's allright, >>> : P >>> >>> The challenge is interesting. If we want to build a dedicated backup API >>> service (which is always what we wanted to do), probably we need to: >>> >>> >>>- Place out of Nova and Cinder the backup features, as it wouldn't >>>make much sense to me to have a Backup service and also have backups >>>managed independently by Nova and Cinder. >>> >>> >>> That said, I'm not a big fan of the following: >>> >>>- Interacting with the hypervisors and the volumes directly without >>>passing through the Nova and Cinder API. >>> >>> Passing through the api will be a huge issue for extracting data due to >> the sheer volume of data needed (TB through the api is going to kill >> everything!) >> >>> >>>- Adding any additional workload on the compute nodes or block >>>storage nodes. >>>- Computing incremental, compression, encryption is expensive. Have >>>many simultaneous process doing that may lead to bad behaviours on core >>>services. >>> >>> These are valid concerns, but the alternative is still shipping the raw >> data elsewhere to do this work, and
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
Hi Fausto, To be clear, I am not in any way critical of Freezer and the folk putting in work. (Please, I want to be *entirely* clear on this point! Also, saw your presentation in Vancouver.) That said, Freezer is a bit of a Swiss-Army-Knife set of combined backup functions. Sometimes it is better to focus on a single aspect (or few). The new features landing in QEMU present an opportunity. A project focused solely on that opportunity, to work through initial set of issues, makes a lot of sense to me. (Something like how KVM forked QEMU for a time, to build faster x86 emulation.) I do not see these as competing projects, and more as cooperative. The Ekko work should be able to plug into Freezer, cleanly. Aspects of the problem, as I see: 1. Extracting efficient instance backups from OpenStack (via new QEMU function). 2. Storing backups in efficient form (general implementation, and vendor-specific supersets). 3. Offering an OpenStack backup-service API, with core and vendor-specific extensions. Vendors (like my employer, EMC) might be somewhat opinionated about (2), and for reason. :) The huge missing piece is (1), and a focused project seems to make a lot of sense. As to (3), that looks like a good topic for further discussion. :) My $.02. On Sat, Jan 30, 2016 at 5:36 PM, Fausto Marzi <fausto.ma...@gmail.com> wrote: > Hi Preston, > No need to apologize. They are aspect of the same problem. > However, VMs backup is one of the many aspects that we are approaching > here, such as: > > - VM backups > - Volumes backups > - Specific applications consistent data backup (i.e. MySQL, Mongo, file > system, etc) > - Provide capabilities to restore data even if keystone and swift are not > available > - Upload data during backup to multiple media storage in parallel > - Web Interface > - Provide capability to synchronize backups for sharded data on multiple > nodes > - Encryption > - File based incremental > - Block based incremental > - Tenant related data backup and restore > - Multi platform OS support (i.e. Linux, BSD, OSX, Windows, iOS, Android, > etc) > - Everything is upstreamed. > > This looks like a list of features... and actually it is. > > Block based and some multi platform OS aside, all the mentioned features > are provided to the date. Most of them are available since Kilo. > > I agree with the multi API, room for vendors and to provide different > approaches, but please let me say something (*not referring specifically > to you or Sam or anyone*) > > All the time people say you have to do this and that, but the facts are > that at the end of the day, always the same 6 engineers (not even full > time) are working on it since 2 years, investing professional and personal > time on it. > > We try to be open, to accept everybody (even the most arrogant), to > implement features for whoever needs it, but the facts are that the only > Companies that invested on it are HP, a bit Ericsson and Orange (apologize > if I forgot anyone). We never said no to anyone about anything, never > focused only to a single Company influence, never blocked a thing... and > never will. > > Wouldn't be better to join efforts if companies need a backup solution and > have their own requirements implemented by a common public Team, rather > then start creating many tools to solve the same set of problems? How can > ever competition benefit this? How can ever fragmenting projects help to > provide a better solution? > > I'm sorry, but unless the TC or many people from the Community, tell us to > do something different (in that case we'll do it straight away), we'll keep > doing what we are doing, focusing on delivering what we think is the most > advanced solution, according the resources and time we have. > > We need to understand that here the most important thing is to work in > Team, to provide great tools to the Community, rather then thinking to be > PTL or maintain independence just for the sake of it or focusing only on > what's the best for a single Company. If this vision is not shared, then, > unfortunately, good luck competing, while if the vision is shared... let's > do together unprecedented things. > > Many thanks, > Fausto > > > On Sun, Jan 31, 2016 at 1:01 AM, Preston L. Bannister < > pres...@bannister.us> wrote: > >> Seems to me there are three threads here. >> >> The Freezer folk were given a task, and did the best possible to support >> backup given what OpenStack allowed. To date, OpenStack is simply not very >> good at supporting backup as a service. (Apologies to the Freezer folk if I >> misinterpreted.) >> >> The patches (finally) landing in QEMU in support of incremental backup >> could be the basis for
Re: [openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack
Seems to me there are three threads here. The Freezer folk were given a task, and did the best possible to support backup given what OpenStack allowed. To date, OpenStack is simply not very good at supporting backup as a service. (Apologies to the Freezer folk if I misinterpreted.) The patches (finally) landing in QEMU in support of incremental backup could be the basis for efficient backup services in OpenStack. This is all fairly high risk, in the short term. The bits that landed in QEMU 2.4 may not be sufficient (there are more QEMU patches trying to land). When put into production, we may find faults. For use in OpenStack, we may need changes in libvirt, and/or in Nova. (Or *maybe* not if usage for backup proves orthogonal.) The only way to work out the prior is to start. The timeline could be months or years. There is a need for a common API for backup as a service in the cloud. Something more than imitating AWS. Allow some room for vendors with differing approach. I see the above as not competing, but aspects of the same problem. ​ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Can I count on the OS-TRUST extension for a backup service?
In the implementation of a instance backup service for OpenStack, on restore I need to (re)create the restored instance in the original tenant. Restores can be fired off by an administrator (not the original user), so at instance-create time I have two main choices: 1. Create the instance as the backup service. 2. Create the instance as the original user. Clearly (1) is workable (given the backup user has access to the tenant). Keypairs are a bit of an issue, but solvable. Also clearly (2) is better, but that requires a means to impersonate the original user. Keystone trusts seem to be that means, but raises additional questions. (Also the fact the current documentation for Keystone is incomplete in this area does not raise the confidence level.) 1. How far back is the Keystone OS-TRUST extension reliable? (Kilo? Juno?) 2. Do any OpenStack distributions omit the OS-TRUST extension? A feature labelled as an "extension" poses a risk to the developer. :) Trying to get a handle on that risk. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
John, As a (new) OpenStack developer, I just discovered the CINDER_SECURE_DELETE option. As an *implicit* default, I entirely approve. Production OpenStack installations should *absolutely* insure there is no information leakage from one instance to the next. As an *explicit* default, I am not so sure. Low-end storage requires you do this explicitly. High-end storage can insure information never leaks. Counting on high level storage can make the upper levels more efficient, can be a good thing. The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. On Wed, Oct 22, 2014 at 11:15 PM, John Griffith john.griffi...@gmail.com wrote: On Tue, Oct 21, 2014 at 9:17 AM, Duncan Thomas duncan.tho...@gmail.com wrote: For LVM-thin I believe it is already disabled? It is only really needed on LVM-thick, where the returning zeros behaviour is not done. On 21 October 2014 08:29, Avishay Traeger avis...@stratoscale.com wrote: I would say that wipe-on-delete is not necessary in most deployments. Most storage backends exhibit the following behavior: 1. Delete volume A that has data on physical sectors 1-10 2. Create new volume B 3. Read from volume B before writing, which happens to map to physical sector 5 - backend should return zeroes here, and not data from volume A In case the backend doesn't provide this rather standard behavior, data must be wiped immediately. Otherwise, the only risk is physical security, and if that's not adequate, customers shouldn't be storing all their data there regardless. You could also run a periodic job to wipe deleted volumes to reduce the window of vulnerability, without making delete_volume take a ridiculously long time. Encryption is a good option as well, and of course it protects the data before deletion as well (as long as your keys are protected...) Bottom line - I too think the default in devstack should be to disable this option, and think we should consider making the default False in Cinder itself. This isn't the first time someone has asked why volume deletion takes 20 minutes... As for queuing backup operations and managing bandwidth for various operations, ideally this would be done with a holistic view, so that for example Cinder operations won't interfere with Nova, or different Nova operations won't interfere with each other, but that is probably far down the road. Thanks, Avishay On Tue, Oct 21, 2014 at 9:16 AM, Chris Friesen chris.frie...@windriver.com wrote: On 10/19/2014 09:33 AM, Avishay Traeger wrote: Hi Preston, Replies to some of your cinder-related questions: 1. Creating a snapshot isn't usually an I/O intensive operation. Are you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the CPU usage of cinder-api spike sometimes - not sure why. 2. The 'dd' processes that you see are Cinder wiping the volumes during deletion. You can either disable this in cinder.conf, or you can use a relatively new option to manage the bandwidth used for this. IMHO, deployments should be optimized to not do very long/intensive management operations - for example, use backends with efficient snapshots, use CoW operations wherever possible rather than copying full volumes/images, disabling wipe on delete, etc. In a public-cloud environment I don't think it's reasonable to disable wipe-on-delete. Arguably it would be better to use encryption instead of wipe-on-delete. When done with the backing store, just throw away the key and it'll be secure enough for most purposes. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Duncan Thomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev We disable this in the Gates CINDER_SECURE_DELETE=False ThinLVM (which hopefully will be default upon release of Kilo) doesn't need it because internally it returns zeros when reading unallocated blocks so it's a non-issue. The debate of to wipe LV's or not to is a long running issue. The default behavior in Cinder is to leave it enable and IMHO that's how it should stay. The fact is anything that might be construed as less secure and has been defaulted to the more secure setting should be left as it is. It's simple to turn this off. Also, nobody seemed to mention that in the case of Cinder operations like
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 7:51 AM, John Griffith john.griffi...@gmail.com wrote: On Thu, Oct 23, 2014 at 8:50 AM, John Griffith john.griffi...@gmail.com wrote: On Thu, Oct 23, 2014 at 1:30 AM, Preston L. Bannister pres...@bannister.us wrote: John, As a (new) OpenStack developer, I just discovered the CINDER_SECURE_DELETE option. OHHH... Most importantly, I almost forgot. Welcome!!! Thanks! (I think...) It doesn't suck as bad as you might have thought or some of the other respondents on this thread seem to think. There's certainly room for improvement and growth but it hasn't been completely ignored on the Cinder side. To be clear, I am fairly impressed with what has gone into OpenStack as a whole. Given the breadth, complexity, and growth ... not everything is going to be perfect (yet?). So ... not trying to disparage past work, but noting what does not seem right. (Also know I could easily be missing something.) The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. Yes, that is pretty much the key. Does LVM let you read physical blocks that have never been written? Or zero out virgin segments on read? If not, then dd of zeroes is a way of doing the right thing (if *very* expensive). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
On Thu, Oct 23, 2014 at 3:04 PM, John Griffith john.griffi...@gmail.com wrote: The debate about whether to wipe LV's pretty much massively depends on the intelligence of the underlying store. If the lower level storage never returns accidental information ... explicit zeroes are not needed. On Thu, Oct 23, 2014 at 3:44 PM, Preston L. Bannister pres...@bannister.us wrote: Yes, that is pretty much the key. Does LVM let you read physical blocks that have never been written? Or zero out virgin segments on read? If not, then dd of zeroes is a way of doing the right thing (if *very* expensive). Yeah... so that's the crux of the issue on LVM (Thick). It's quite possible for a new LV to be allocated from the VG and a block from a previous LV can be allocated. So in essence if somebody were to sit there in a cloud env and just create volumes and read the blocks over and over and over they could gather some previous or other tenants data (or pieces of it at any rate). It's def the right thing to do if you're in an env where you need some level of security between tenants. There are other ways to solve it of course but this is what we've got. Has anyone raised this issue with the LVM folk? Returning zeros on unwritten blocks would require a bit of extra bookkeeping, but a lot more efficient overall. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [devstack] Enable LVM ephemeral storage for Nova
As a side-note, the new AWS flavors seem to indicate that the Amazon infrastructure is moving to all ECS volumes (and all flash, possibly), both ephemeral and not. This makes sense, as fewer code paths and less interoperability complexity is a good thing. That the same balance of concerns should apply in OpenStack, seems likely. On Tue, Oct 21, 2014 at 7:59 AM, Dan Genin daniel.ge...@jhuapl.edu wrote: Hello, I would like to add to DevStack the ability to stand up Nova with LVM ephemeral storage. Below is a draft of the blueprint describing the proposed feature. Suggestions on architecture, implementation and the blueprint in general are very welcome. Best, Dan Enable LVM ephemeral storage for Nova Currently DevStack supports only file based ephemeral storage for Nova, e.g., raw and qcow2. This is an obstacle to Tempest testing of Nova with LVM ephemeral storage, which in the past has been inadvertantly broken (see for example, https://bugs.launchpad.net/nova/+bug/1373962), and to Tempest testing of new features based on LVM ephemeral storage, such as LVM ephemeral storage encryption. To enable Nova to come up with LVM ephemeral storage it must be provided a volume group. Based on an initial discussion with Dean Troyer, this is best achieved by creating a single volume group for all services that potentially need LVM storage; at the moment these are Nova and Cinder. Implementation of this feature will: * move code in lib/cinder/cinder_backends/lvm to lib/lvm with appropriate modifications * rename the Cinder volume group to something generic, e.g., devstack-vg * modify the Cinder initialization and cleanup code appropriately to use the new volume group * initialize the volume group in stack.sh, shortly before services are launched * cleanup the volume group in unstack.sh after the services have been shutdown The question of how large to make the common Nova-Cinder volume group in order to enable LVM ephemeral Tempest testing will have to be explored. Although, given the tiny instance disks used in Nova Tempest tests, the current Cinder volume group size may already be adequate. No new configuration options will be necessary, assuming the volume group size will not be made configurable. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [devstack] Enable LVM ephemeral storage for Nova
Yes, I meant EBS not ECS. Too many similar acronyms... The thing about the Amazon folk is that they collect a lot of metrics, and pretty much do everything on a fairly empirical basis. This is a huge advantage. Starting thinking about what I could with good metrics and building on the performance characteristics of flash. Turns out ... I can see how this could work (and very, very well). But that requires a much longer write-up than I have time for at the moment. On Tue, Oct 21, 2014 at 12:11 PM, Dan Genin daniel.ge...@jhuapl.edu wrote: Did you mean EBS? I thought it was generally hard to get the same kind of performance from block storage that local ephemeral storage provides but perhaps Amazon has found a way. Life would certainly be much simpler with a single ephemeral backend. Storage pools ( https://blueprints.launchpad.net/nova/+spec/use-libvirt-storage-pools) should provide some of the same benefits. On 10/21/2014 02:54 PM, Preston L. Bannister wrote: As a side-note, the new AWS flavors seem to indicate that the Amazon infrastructure is moving to all ECS volumes (and all flash, possibly), both ephemeral and not. This makes sense, as fewer code paths and less interoperability complexity is a good thing. That the same balance of concerns should apply in OpenStack, seems likely. On Tue, Oct 21, 2014 at 7:59 AM, Dan Genin daniel.ge...@jhuapl.edu wrote: Hello, I would like to add to DevStack the ability to stand up Nova with LVM ephemeral storage. Below is a draft of the blueprint describing the proposed feature. Suggestions on architecture, implementation and the blueprint in general are very welcome. Best, Dan Enable LVM ephemeral storage for Nova Currently DevStack supports only file based ephemeral storage for Nova, e.g., raw and qcow2. This is an obstacle to Tempest testing of Nova with LVM ephemeral storage, which in the past has been inadvertantly broken (see for example, https://bugs.launchpad.net/nova/+bug/1373962), and to Tempest testing of new features based on LVM ephemeral storage, such as LVM ephemeral storage encryption. To enable Nova to come up with LVM ephemeral storage it must be provided a volume group. Based on an initial discussion with Dean Troyer, this is best achieved by creating a single volume group for all services that potentially need LVM storage; at the moment these are Nova and Cinder. Implementation of this feature will: * move code in lib/cinder/cinder_backends/lvm to lib/lvm with appropriate modifications * rename the Cinder volume group to something generic, e.g., devstack-vg * modify the Cinder initialization and cleanup code appropriately to use the new volume group * initialize the volume group in stack.sh, shortly before services are launched * cleanup the volume group in unstack.sh after the services have been shutdown The question of how large to make the common Nova-Cinder volume group in order to enable LVM ephemeral Tempest testing will have to be explored. Although, given the tiny instance disks used in Nova Tempest tests, the current Cinder volume group size may already be adequate. No new configuration options will be necessary, assuming the volume group size will not be made configurable. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or not. Have a DevStack, running in a VM (VirtualBox), backed by a single flash drive (on my current generation MacBook). Could be I have something off in my setup. Testing nova backup - first the existing implementation, then my (much changed) replacement. Simple scripts for testing. Create images. Create instances (five). Run backup on all instances. Currently found in: https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts First time I started backups of all (five) instances, load on the Devstack VM went insane, and all but one backup failed. Seems that all of the backups were performed immediately (or attempted), without any sort of queuing or load management. Huh. Well, maybe just the backup implementation is naive... I will write on this at greater length, but backup should interfere as little as possible with foreground processing. Overloading a host is entirely unacceptable. Replaced the backup implementation so it does proper queuing (among other things). Iterating forward - implementing and testing. Fired off snapshots on five Cinder volumes (attached to five instances). Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, maybe storage can handle that much I/O more gracefully ... or not. Again, should taking snapshots interfere with foreground activity? I would say, most often not. Queuing and serializing snapshots would strictly limit the interference with foreground. Also, very high end storage can perform snapshots *very* quickly, so serialized snapshots will not be slow. My take is that the default behavior should be to queue and serialize all heavy I/O operations, with non-default allowances for limited concurrency. Cleaned up (which required reboot/unstack/stack and more). Tried again. Ran two test backups (which in the current iteration create Cinder volume snapshots). Asked Cinder to delete the snapshots. Again, very high load factors, and in top I can see two long-running dd processes. (Given I have a single disk, more than one dd is not good.) Running too many heavyweight operations against storage can lead to thrashing. Queuing can strictly limit that load, and insure better and reliable performance. I am not seeing evidence of this thought in my OpenStack testing. So far it looks like there is no thought to managing the impact of disk intensive management operations. Am I missing something? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Avishay, Thanks for the tip on [cinder.conf] volume_clear. The corresponding option in devstack is CINDER_SECURE_DELETE=False. Also I *may* have been bitten by the related bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1023755 (All I know at this point is the devstack VM became unresponsive - have not yet identified the cause. But the symptoms fit.) Not sure if there are spikes on Cinder snapshot creation. Perhaps not. (Too many different failures and oddities. Have not sorted all, yet.) I am of the opinion CINDER_SECURE_DELETE=False should be a default for devstack. Especially as it invokes bug-like behavior. Also, unbounded concurrent dd operations is not a good idea. (Which is generally what you meant, I believe.) Onwards On Sun, Oct 19, 2014 at 8:33 AM, Avishay Traeger avis...@stratoscale.com wrote: Hi Preston, Replies to some of your cinder-related questions: 1. Creating a snapshot isn't usually an I/O intensive operation. Are you seeing I/O spike or CPU? If you're seeing CPU load, I've seen the CPU usage of cinder-api spike sometimes - not sure why. 2. The 'dd' processes that you see are Cinder wiping the volumes during deletion. You can either disable this in cinder.conf, or you can use a relatively new option to manage the bandwidth used for this. IMHO, deployments should be optimized to not do very long/intensive management operations - for example, use backends with efficient snapshots, use CoW operations wherever possible rather than copying full volumes/images, disabling wipe on delete, etc. Thanks, Avishay On Sun, Oct 19, 2014 at 1:41 PM, Preston L. Bannister pres...@bannister.us wrote: OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or not. Have a DevStack, running in a VM (VirtualBox), backed by a single flash drive (on my current generation MacBook). Could be I have something off in my setup. Testing nova backup - first the existing implementation, then my (much changed) replacement. Simple scripts for testing. Create images. Create instances (five). Run backup on all instances. Currently found in: https://github.com/dreadedhill-work/stack-backup/tree/master/backup-scripts First time I started backups of all (five) instances, load on the Devstack VM went insane, and all but one backup failed. Seems that all of the backups were performed immediately (or attempted), without any sort of queuing or load management. Huh. Well, maybe just the backup implementation is naive... I will write on this at greater length, but backup should interfere as little as possible with foreground processing. Overloading a host is entirely unacceptable. Replaced the backup implementation so it does proper queuing (among other things). Iterating forward - implementing and testing. Fired off snapshots on five Cinder volumes (attached to five instances). Again the load shot very high. Huh. Well, in a full-scale OpenStack setup, maybe storage can handle that much I/O more gracefully ... or not. Again, should taking snapshots interfere with foreground activity? I would say, most often not. Queuing and serializing snapshots would strictly limit the interference with foreground. Also, very high end storage can perform snapshots *very* quickly, so serialized snapshots will not be slow. My take is that the default behavior should be to queue and serialize all heavy I/O operations, with non-default allowances for limited concurrency. Cleaned up (which required reboot/unstack/stack and more). Tried again. Ran two test backups (which in the current iteration create Cinder volume snapshots). Asked Cinder to delete the snapshots. Again, very high load factors, and in top I can see two long-running dd processes. (Given I have a single disk, more than one dd is not good.) Running too many heavyweight operations against storage can lead to thrashing. Queuing can strictly limit that load, and insure better and reliable performance. I am not seeing evidence of this thought in my OpenStack testing. So far it looks like there is no thought to managing the impact of disk intensive management operations. Am I missing something? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][nova] Are disk-intensive operations managed ... or not?
Jay, Thanks very much for the insight and links. In fact, I have visited *almost* all the places mentioned, prior. Added clarity is good. :) Also, to your earlier comment (to an earlier thread) about backup not really belonging in Nova - in main I agree. The backup API belongs in Nova (as this maps cleanly to the equivalent in AWS), but the bulk of the implementation can and should be distinct (in my opinion). My current work is at: https://github.com/dreadedhill-work/stack-backup I also have matching changes to Nova and the Nova client under the same Github account. Please note this is very much a work in progress (as you might guess from my prior comments). This needs a longer proper write up, and a cleaner Git history. The code is a pretty fair ways along, but should be considered more a rough draft, rather than a final version. For the next few weeks, I am enormously crunched for time, as I have promised a PoC at a site with a very large OpenStack deployment. Noted your suggestion about the Rally team. Might be a bit before I can pursue. :) Again, Thanks. On Sun, Oct 19, 2014 at 10:13 AM, Jay Pipes jaypi...@gmail.com wrote: Hi Preston, some great questions in here. Some comments inline, but tl;dr my answer is yes, we need to be doing a much better job thinking about how I/O intensive operations affect other things running on providers of compute and block storage resources On 10/19/2014 06:41 AM, Preston L. Bannister wrote: OK, I am fairly new here (to OpenStack). Maybe I am missing something. Or not. Have a DevStack, running in a VM (VirtualBox), backed by a single flash drive (on my current generation MacBook). Could be I have something off in my setup. Testing nova backup - first the existing implementation, then my (much changed) replacement. Simple scripts for testing. Create images. Create instances (five). Run backup on all instances. Currently found in: https://github.com/dreadedhill-work/stack-backup/ tree/master/backup-scripts First time I started backups of all (five) instances, load on the Devstack VM went insane, and all but one backup failed. Seems that all of the backups were performed immediately (or attempted), without any sort of queuing or load management. Huh. Well, maybe just the backup implementation is naive... Yes, you are exactly correct. There is no queuing behaviour for any of the backup operations (I put backup operations in quotes because IMO it is silly to refer to them as backup operations, since all they are doing really is a snapshot action against the instance/volume -- and then attempting to be a poor man's cloud cron). The backup is initiated from the admin_actions API extension here: https://github.com/openstack/nova/blob/master/nova/api/ openstack/compute/contrib/admin_actions.py#L297 which calls the nova.compute.api.API.backup() method here: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2031 which, after creating some image metadata in Glance for the snapshot, calls the compute RPC API here: https://github.com/openstack/nova/blob/master/nova/compute/rpcapi.py#L759 Which sends an RPC asynchronous message to the compute node to execute the instance snapshot and rotate backups: https://github.com/openstack/nova/blob/master/nova/compute/ manager.py#L2969 That method eventually calls the blocking snapshot() operation on the virt driver: https://github.com/openstack/nova/blob/master/nova/compute/ manager.py#L3041 And it is the nova.virt.libvirt.Driver.snapshot() method that is quite icky, with lots of logic to determine the type of snapshot to do and how to do it: https://github.com/openstack/nova/blob/master/nova/virt/ libvirt/driver.py#L1607 The gist of the driver's snapshot() method calls ImageBackend.snapshot(), which is responsible for doing the actual snapshot of the instance: https://github.com/openstack/nova/blob/master/nova/virt/ libvirt/driver.py#L1685 and then once the snapshot is done, the method calls to the Glance API to upload the snapshotted disk image to Glance: https://github.com/openstack/nova/blob/master/nova/virt/ libvirt/driver.py#L1730-L1734 All of which is I/O intensive and AFAICT, mostly done in a blocking manner, with no queuing or traffic control measures, so as you correctly point out, if the compute node daemon receives 5 backup requests, it will go ahead and do 5 snapshot operations and 5 uploads to Glance all as fast as it can. It will do it in 5 different eventlet greenthreads, but there are no designs in place to prioritize the snapshotting I/O lower than active VM I/O. I will write on this at greater length, but backup should interfere as little as possible with foreground processing. Overloading a host is entirely unacceptable. Agree with you completely. Replaced the backup implementation so it does proper queuing (among other things). Iterating forward - implementing and testing. Is this code up
Re: [openstack-dev] 2 Minute tokens
Too-short token expiration times are one of my concerns, in my current exercise. Working on a replacement for Nova backup. Basically creating backups jobs, writing the jobs into a queue, with a background worker that reads jobs from the queue. Tokens could expire while the jobs are in the queue (not too likely). Tokens could expire during the execution of a backup (while can be very long running, in some cases). Had not run into mention of trusts before. Is the intent to cover this sort of use-case? (Pulled up what I could find on trusts. Need to chew on this a bit, as it is not immediately clear if this fits.) On Wed, Oct 1, 2014 at 6:53 AM, Adam Young ayo...@redhat.com wrote: On 10/01/2014 04:14 AM, Steven Hardy wrote: On Tue, Sep 30, 2014 at 10:44:51AM -0400, Adam Young wrote: What is keeping us from dropping the (scoped) token duration to 5 minutes? If we could keep their lifetime as short as network skew lets us, we would be able to: Get rid of revocation checking. Get rid of persisted tokens. OK, so that assumes we can move back to PKI tokens, but we're working on that. What are the uses that require long lived tokens? Can they be replaced with a better mechanism for long term delegation (OAuth or Keystone trusts) as Heat has done? FWIW I think you're misrepresenting Heat's usage of Trusts here - 2 minute tokens will break Heat just as much as any other service: https://bugs.launchpad.net/heat/+bug/1306294 http://lists.openstack.org/pipermail/openstack-dev/2014- September/045585.html Summary: - Heat uses the request token to process requests (e.g stack create), which may take an arbitrary amount of time (default timeout one hour). - Some use-cases demand timeout of more than one hour (specifically big TripleO deployments), heat breaks in these situations atm, folks are working around it by using long (several hour) token expiry times. - Trusts are only used of asynchronous signalling, e.g Ceilometer signals Heat, we switch to a trust scoped token to process the response to the alarm (e.g launch more instances on behalf of the user for autoscaling) My understanding, ref notes in that bug, is that using Trusts while servicing a request to effectively circumvent token expiry was not legit (or at least yukky and to be avoided). If you think otherwise then please let me know, as that would be the simplest way to fix the bug above (switch to a trust token while doing the long-running create operation). Using trusts to circumvent timeout is OK. There are two issues in tension here: 1. A user needs to be able to maintain control of their own data. 2. We want to limit the attack surface provided by tokens. Since tokens are currently blanket access to the users data, there really is no lessening of control by using trusts in a wider context. I'd argue that using trusts would actually reduce the capability for abuse,if coupled with short lived tokens. With long lived tokens, anyone can reuse the token. With a trust, only the trustee would be able to create a new token. Could we start by identifying the set of operations that are currently timing out due to the one hour token duration and add an optional trustid on those operations? Trusts is not really ideal for this use-case anyway, as it requires the service to have knowledge of the roles to delegate (or that the user provides a pre-created trust), ref bug #1366133. I suppose we could just delegate all the roles we find in the request scope and be done with it, given that bug has been wontfixed. Steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] MySQL performance and Mongodb backend maturity question
Sorry, I am jumping into this without enough context, but ... On Wed, Sep 24, 2014 at 8:37 PM, Qiming Teng teng...@linux.vnet.ibm.com wrote: mysql select count(*) from metadata_text; +--+ | count(*) | +--+ | 25249913 | +--+ 1 row in set (3.83 sec) There are problems where a simple sequential log file is superior to a database table. The above looks like a log ... a very large number of events, without an immediate customer. For sequential access, a simple file is *vastly* superior to a database table. If you are thinking about indexed access to the above as a table, think about the cost of adding items to the index, for that many items. The cost of building the index is not small. Running a map/reduce on sequential files might be faster. Again, I do not have enough context, but ... 25 million rows? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] 2 weeks in the bug tracker
This is great. On the point of: If an Incomplete bug has no response after 30 days it's fair game to close (Invalid, Opinion, Won't Fix). How about Stale ... since that is where it is. (How hard to add a state?) On Fri, Sep 19, 2014 at 6:13 AM, Sean Dague s...@dague.net wrote: I've spent the better part of the last 2 weeks in the Nova bug tracker to try to turn it into something that doesn't cause people to run away screaming. I don't remember exactly where we started at open bug count 2 weeks ago (it was north of 1400, with 200 bugs in new, but it might have been north of 1600), but as of this email we're at 1000 open bugs (I'm counting Fix Committed as closed, even though LP does not), and ~0 new bugs (depending on the time of the day). == Philosophy in Triaging == I'm going to lay out the philosophy of triaging I've had, because this may also set the tone going forward. A bug tracker is a tool to help us make a better release. It does not exist for it's own good, it exists to help. Which means when evaluating what stays in and what leaves we need to evaluate if any particular artifact will help us make a better release. But also more importantly realize that there is a cost for carrying every artifact in the tracker. Resolving duplicates gets non linearly harder as the number of artifacts go up. Triaging gets non-linearly hard as the number of artifacts go up. With this I was being somewhat pragmatic about closing bugs. An old bug that is just a stacktrace is typically not useful. An old bug that is a vague sentence that we should refactor a particular module (with no specifics on the details) is not useful. A bug reported against a very old version of OpenStack where the code has changed a lot in the relevant area, and there aren't responses from the author, is not useful. Not useful bugs just add debt, and we should get rid of them. That makes the chance of pulling a random bug off the tracker something that you could actually look at fixing, instead of mostly just stalling out. So I closed a lot of stuff as Invalid / Opinion that fell into those camps. == Keeping New Bugs at close to 0 == After driving the bugs in the New state down to zero last week, I found it's actually pretty easy to keep it at 0. We get 10 - 20 new bugs a day in Nova (during a weekday). Of those ~20% aren't actually a bug, and can be closed immediately. ~30% look like a bug, but don't have anywhere near enough information in them, and flipping them to incomplete with questions quickly means we have a real chance of getting the right info. ~10% are fixable in 30 minutes worth of work. And the rest are real bugs, that seem to have enough to dive into it, and can be triaged into Confirmed, set a priority, and add the appropriate tags for the area. But, more importantly, this means we can filter bug quality on the way in. And we can also encourage bug reporters that are giving us good stuff, or even easy stuff, as we respond quickly. Recommendation #1: we adopt a 0 new bugs policy to keep this from getting away from us in the future. == Our worse bug reporters are often core reviewers == I'm going to pick on Dan Prince here, mostly because I have a recent concrete example, however in triaging the bug queue much of the core team is to blame (including myself). https://bugs.launchpad.net/nova/+bug/1368773 is a terrible bug. Also, it was set incomplete and no response. I'm almost 100% sure it's a dupe of the multiprocess bug we've been tracking down but it's so terse that you can't get to the bottom of it. There were a ton of 2012 nova bugs that were basically post it notes. Oh, we should refactor this function. Full stop. While those are fine for personal tracking, their value goes to zero probably 3 months after they are files, especially if the reporter stops working on the issue at hand. Nova has plenty of wouldn't it be great if we... ideas. I'm not convinced using bugs for those is useful unless we go and close them out aggressively if they stall. Also, if Nova core can't file a good bug, it's hard to set the example for others in our community. Recommendation #2: hey, Nova core, lets be better about filing the kinds of bugs we want to see! mkay! Recommendation #3: Let's create a tag for personal work items or something for these class of TODOs people are leaving themselves that make them a ton easier to cull later when they stall and no one else has enough context to pick them up. == Tags == The aggressive tagging that Tracy brought into the project has been awesome. It definitely helps slice out into better functional areas. Here is the top of our current official tag list (and bug count): 95 compute 83 libvirt 74 api 68 vmware 67 network 41 db 40 testing 40 volumes 36 ec2 35 icehouse-backport-potential 32 low-hanging-fruit 31 xenserver 25 ironic 23 hyper-v 16 cells 14 scheduler 12 baremetal 9 ceph 9
Re: [openstack-dev] [nova] nova backup not working in stable/icehouse?
I also believe (2) is the most workable option. Full disclosure ... my current job is at EMC, and we just shipped a backup product for the VMware vCloud (the one used for VMware vCloud Air - http://vcloud.vmware.com/http://vcloud.vmware.com/). First release of that project was wrapping up, and I was asked to look at backup for OpenStack. As I am familiar with both open source community, and with how to do backup at cloud scale ... the problem is an easy fit. Different backup vendors might approach the problem in differing ways, which calls for a pluggable backend. What I find is that OpenStack is missing the hooks we need to do backup efficiently. There are promising proposed bits, but ... we are not there yet. Storage for backups is of a different character than existing services in OpenStack. Backup vendors need a place to plug in, and need some relevant primitive operations. Turns out the AWS folk already have rather nice support for backup from a cloud developer's perspective. At present, there is nothing equivalent in OpenStack. From a cloud developer's perspective, that is a huge lack. While AWS is an API wrapped around a single service, OpenStack is an API wrapped around differing services. This is both harder (in terms of defining the API), and an advantage to customers with differing requirements. What lacks and is needed by backup vendors is the right set of primitives. On Sun, Aug 31, 2014 at 8:19 AM, laserjetyang laserjety...@gmail.com wrote: I tend to say 2) is the best option. There are many open source or commercial backup software, and both for VMs and volume. If we do option 1), it reminds me to implement something similar to VMware method, and it will cause nova really heavy. On Sun, Aug 31, 2014 at 4:04 AM, Preston L. Bannister pres...@bannister.us wrote: You are thinking of written-for-cloud applications. For those the state should not persist with the instance. There are a very large number of existing applications, not written to the cloud model, but which could be deployed in a cloud. Those applications are not all going to get re-written (as the cost is often greater than the benefit). Those applications need some ready and efficient means of backup. The benefits of the cloud-application model and the cloud-deployment model are distinct. The existing nova backup (if it worked) is an inefficient snapshot. Not really useful at scale. There are two basic paths forward, here. 1) Build a complete common backup implementation that everyone can use. Or 2) define a common API for invoking backup, allow vendors to supply differing implementations, and add to OpenStack the APIs needed by backup implementations. Given past history, there does not seem to be enough focus or resources to get (1) done. That makes (2) much more likely. Reasonably sure we can find the interest and resources for this path. :) On Fri, Aug 29, 2014 at 10:55 PM, laserjetyang laserjety...@gmail.com wrote: I think the purpose of nova VM is not for persistent usage, and it should be used for stateless. However, there are use cases to use VM to replace bare metal applications, and it requires the same coverage, which I think VMware did pretty well. The nova backup is snapshot indeed, so it should be re-implemented to be fitting into the real backup solution. On Sat, Aug 30, 2014 at 1:14 PM, Preston L. Bannister pres...@bannister.us wrote: The current backup APIs in OpenStack do not really make sense (and apparently do not work ... which perhaps says something about usage and usability). So in that sense, they could be removed. Wrote out a bit as to what is needed: http://bannister.us/weblog/2014/08/21/cloud-application-backup-and-openstack/ At the same time, to do efficient backup at cloud scale, OpenStack is missing a few primitives needed for backup. We need to be able to quiesce instances, and collect changed-block lists, across hypervisors and filesystems. There is some relevant work in this area - for example: https://wiki.openstack.org/wiki/Nova/InstanceLevelSnapshots Switching hats - as a cloud developer, on AWS there is excellent current means of backup-through-snapshots, which is very quick and is charged based on changed-blocks. (The performance and cost both reflect use of changed-block tracking underneath.) If OpenStack completely lacks any equivalent API, then the platform is less competitive. Are you thinking about backup as performed by the cloud infrastructure folk, or as a service used by cloud developers in deployed applications? The first might do behind-the-scenes backups. The second needs an API. On Fri, Aug 29, 2014 at 11:16 AM, Jay Pipes jaypi...@gmail.com wrote: On 08/29/2014 02:48 AM, Preston L. Bannister wrote: Looking to put a proper implementation of instance backup into OpenStack. Started by writing a simple set of baseline tests and running against the stable/icehouse branch. They failed
Re: [openstack-dev] [nova] nova backup not working in stable/icehouse?
You are thinking of written-for-cloud applications. For those the state should not persist with the instance. There are a very large number of existing applications, not written to the cloud model, but which could be deployed in a cloud. Those applications are not all going to get re-written (as the cost is often greater than the benefit). Those applications need some ready and efficient means of backup. The benefits of the cloud-application model and the cloud-deployment model are distinct. The existing nova backup (if it worked) is an inefficient snapshot. Not really useful at scale. There are two basic paths forward, here. 1) Build a complete common backup implementation that everyone can use. Or 2) define a common API for invoking backup, allow vendors to supply differing implementations, and add to OpenStack the APIs needed by backup implementations. Given past history, there does not seem to be enough focus or resources to get (1) done. That makes (2) much more likely. Reasonably sure we can find the interest and resources for this path. :) On Fri, Aug 29, 2014 at 10:55 PM, laserjetyang laserjety...@gmail.com wrote: I think the purpose of nova VM is not for persistent usage, and it should be used for stateless. However, there are use cases to use VM to replace bare metal applications, and it requires the same coverage, which I think VMware did pretty well. The nova backup is snapshot indeed, so it should be re-implemented to be fitting into the real backup solution. On Sat, Aug 30, 2014 at 1:14 PM, Preston L. Bannister pres...@bannister.us wrote: The current backup APIs in OpenStack do not really make sense (and apparently do not work ... which perhaps says something about usage and usability). So in that sense, they could be removed. Wrote out a bit as to what is needed: http://bannister.us/weblog/2014/08/21/cloud-application-backup-and-openstack/ At the same time, to do efficient backup at cloud scale, OpenStack is missing a few primitives needed for backup. We need to be able to quiesce instances, and collect changed-block lists, across hypervisors and filesystems. There is some relevant work in this area - for example: https://wiki.openstack.org/wiki/Nova/InstanceLevelSnapshots Switching hats - as a cloud developer, on AWS there is excellent current means of backup-through-snapshots, which is very quick and is charged based on changed-blocks. (The performance and cost both reflect use of changed-block tracking underneath.) If OpenStack completely lacks any equivalent API, then the platform is less competitive. Are you thinking about backup as performed by the cloud infrastructure folk, or as a service used by cloud developers in deployed applications? The first might do behind-the-scenes backups. The second needs an API. On Fri, Aug 29, 2014 at 11:16 AM, Jay Pipes jaypi...@gmail.com wrote: On 08/29/2014 02:48 AM, Preston L. Bannister wrote: Looking to put a proper implementation of instance backup into OpenStack. Started by writing a simple set of baseline tests and running against the stable/icehouse branch. They failed! https://github.com/dreadedhill-work/openstack-backup-scripts Scripts and configuration are in the above. Simple tests. At first I assumed there was a configuration error in my Devstack ... but at this point I believe the errors are in fact in OpenStack. (Also I have rather more colorful things to say about what is and is not logged.) Try to backup bootable Cinder volumes attached to instances ... and all fail. Try to backup instances booted from images, and all-but-one fail (without logged errors, so far as I see). Was concerned about preserving existing behaviour (as I am currently hacking the Nova backup API), but ... if the existing is badly broken, this may not be a concern. (Makes my job a bit simpler.) If someone is using nova backup successfully (more than one backup at a time), I *would* rather like to know! Anyone with different experience? IMO, the create_backup API extension should be removed from the Compute API. It's completely unnecessary and backups should be the purview of external (to Nova) scripts or configuration management modules. This API extension is essentially trying to be a Cloud Cron, which is inappropriate for the Compute API, IMO. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack
[openstack-dev] [nova] nova backup not working in stable/icehouse?
Looking to put a proper implementation of instance backup into OpenStack. Started by writing a simple set of baseline tests and running against the stable/icehouse branch. They failed! https://github.com/dreadedhill-work/openstack-backup-scripts Scripts and configuration are in the above. Simple tests. At first I assumed there was a configuration error in my Devstack ... but at this point I believe the errors are in fact in OpenStack. (Also I have rather more colorful things to say about what is and is not logged.) Try to backup bootable Cinder volumes attached to instances ... and all fail. Try to backup instances booted from images, and all-but-one fail (without logged errors, so far as I see). Was concerned about preserving existing behaviour (as I am currently hacking the Nova backup API), but ... if the existing is badly broken, this may not be a concern. (Makes my job a bit simpler.) If someone is using nova backup successfully (more than one backup at a time), I *would* rather like to know! Anyone with different experience? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova backup not working in stable/icehouse?
The current backup APIs in OpenStack do not really make sense (and apparently do not work ... which perhaps says something about usage and usability). So in that sense, they could be removed. Wrote out a bit as to what is needed: http://bannister.us/weblog/2014/08/21/cloud-application-backup-and-openstack/ At the same time, to do efficient backup at cloud scale, OpenStack is missing a few primitives needed for backup. We need to be able to quiesce instances, and collect changed-block lists, across hypervisors and filesystems. There is some relevant work in this area - for example: https://wiki.openstack.org/wiki/Nova/InstanceLevelSnapshots Switching hats - as a cloud developer, on AWS there is excellent current means of backup-through-snapshots, which is very quick and is charged based on changed-blocks. (The performance and cost both reflect use of changed-block tracking underneath.) If OpenStack completely lacks any equivalent API, then the platform is less competitive. Are you thinking about backup as performed by the cloud infrastructure folk, or as a service used by cloud developers in deployed applications? The first might do behind-the-scenes backups. The second needs an API. On Fri, Aug 29, 2014 at 11:16 AM, Jay Pipes jaypi...@gmail.com wrote: On 08/29/2014 02:48 AM, Preston L. Bannister wrote: Looking to put a proper implementation of instance backup into OpenStack. Started by writing a simple set of baseline tests and running against the stable/icehouse branch. They failed! https://github.com/dreadedhill-work/openstack-backup-scripts Scripts and configuration are in the above. Simple tests. At first I assumed there was a configuration error in my Devstack ... but at this point I believe the errors are in fact in OpenStack. (Also I have rather more colorful things to say about what is and is not logged.) Try to backup bootable Cinder volumes attached to instances ... and all fail. Try to backup instances booted from images, and all-but-one fail (without logged errors, so far as I see). Was concerned about preserving existing behaviour (as I am currently hacking the Nova backup API), but ... if the existing is badly broken, this may not be a concern. (Makes my job a bit simpler.) If someone is using nova backup successfully (more than one backup at a time), I *would* rather like to know! Anyone with different experience? IMO, the create_backup API extension should be removed from the Compute API. It's completely unnecessary and backups should be the purview of external (to Nova) scripts or configuration management modules. This API extension is essentially trying to be a Cloud Cron, which is inappropriate for the Compute API, IMO. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Proposal for instance-level snapshots in Nova
Did this ever go anywhere? http://lists.openstack.org/pipermail/openstack-dev/2014-January/024315.html Looking at what is needed to get backup working in OpenStack, and this seems the most recent reference. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev