Re: [Openstack-operators] Openstack Version discovery with the cli client.

2018-08-07 Thread George Mihaiescu
Hi Saverio,

I think only the API versions supported by some of the endpoint are
discoverable, as described here:
https://wiki.openstack.org/wiki/VersionDiscovery

curl https://x.x.x.x:9292/image
curl https://x.x.x.x:8774/compute


Cheers,
George

On Tue, Aug 7, 2018 at 9:30 AM, Saverio Proto  wrote:

> Hello Jimmy,
>
> thanks for your help. If I understand correctly the answer you linked,
> that helps if you operate the cloud and you have access to the
> servers. Then of course you can call nova-manage.
>
> But being a user of a public cloud without having access the the
> infrastructure servers ... how do you do that ?
>
> thanks
>
> Saverio
>
>
>
> Il giorno mar 7 ago 2018 alle ore 15:09 Jimmy McArthur
>  ha scritto:
> >
> > Hey Saverio,
> >
> > This answer from ask.openstack.org should have what you're looking for:
> > https://ask.openstack.org/en/question/45513/how-to-find-
> out-which-version-of-openstack-is-installed/at
> >
> > Once you get the release number, you have to look it up here to match
> > the release date: https://releases.openstack.org/
> >
> > I had to use this the other day when taking the COA.
> >
> > Cheers,
> > Jimmy
> >
> > Saverio Proto wrote:
> > > Hello,
> > >
> > > This is maybe a super trivial question bit I have to admit I could not
> > > figure it out.
> > >
> > > Can the user with the openstack cli client discover the version of
> > > Openstack that is running ?
> > >
> > > For example in kubernetes the kubectl version command returns the
> > > version of the client and the version of the cluster.
> > >
> > > For Openstack I never managed to discover the backend version, and
> > > this could be useful when using public clouds.
> > >
> > > Anyone knows how to do that ?
> > >
> > > thanks
> > >
> > > Saverio
> > >
> > > ___
> > > OpenStack-operators mailing list
> > > OpenStack-operators@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack-operators
> >
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread George Mihaiescu
Can you manually assign an IP address to a VM and once inside, ping the
address of the dhcp server?
That would confirm if there is connectivity at least.


Also, on the controller node where the dhcp server for that network is,
check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases"
and make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any
miss-configuration), then an agent is out-of-sync and restart usually fixes
things.



On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer 
wrote:

> I have done tcpdumps on both the controllers and on a compute node.
> Controller:
> `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0
> -i ns-83d68c76-b8 port 67`
> `tcpdump -vnes0 -i any port 67`
> Compute:
> `tcpdump -vnes0 -i brqd85c2a00-a6 port 68`
>
> For the first command on the controller, there are no packets captured at
> all. The second command on the controller captures packets, but they don't
> appear to be relevant to openstack. The dump from the compute node shows
> constant requests are getting sent by openstack instances.
>
> In summary; DHCP requests are being sent, but are never received.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 4:50 PM
> *To*: torin.wolt...@granddial.com
> *Subject*: Re: [Openstack] Recovering from full outage
>
> The cloud-init requires network connectivity by default in order to reach
> the metadata server for the hostname, ssh-key, etc
>
> You can configure cloud-init to use the config-drive, but the lack of
> network connectivity will make the instance useless anyway, even though it
> will have you ssh-key and hostname...
>
> Did you check the things I told you?
>
> On Jul 5, 2018, at 16:06, Torin Woltjer 
> wrote:
>
> Are IP addresses set by cloud-init on boot? I noticed that cloud-init
> isn't working on my VMs. created a new instance from an ubuntu 18.04 image
> to test with, the hostname was not set to the name of the instance and
> could not login as users I had specified in the configuration.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * <http://www.granddial.com> <http://www.granddial.com>
> <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 12:57 PM
> *To*: torin.wolt...@granddial.com
> *Cc*: "openst...@lists.openstack.org" , "
> openstack-operators@lists.openstack.org"  openstack.org>
> *Subject*: Re: [Openstack] Recovering from full outage
> You should tcpdump inside the qdhcp namespace to see if the requests make
> it there, and also check iptables rules on the compute nodes for the return
> traffic.
>
>
> On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
>> eventually come up with no addresses. neutron-dhcp-agent has been restarted
>> on both controllers. The qdhcp netns's were all present; I stopped the
>> service, removed the qdhcp netns's, noted the dhcp agents show offline by
>> `neutron agent-list`, restarted all neutron services, noted the qdhcp
>> netns's were recreated, restarted a VM again and it still fails to pull an
>> IP address.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> * <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/5/18 10:38 AM
>> *To*: torin.wolt...@granddial.com
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>>
>> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
>> torin.wolt...@granddial.com> wrote:
>>
>>> The qrouter netns appears once the lock_path is specified, the neutron
>>> router is pingable as well. However, instances are not pingable. If I log
>>> in via console, the instances have not been given IP addresses, if I
>>> manually give them an address and rout

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread George Mihaiescu
You should tcpdump inside the qdhcp namespace to see if the requests make
it there, and also check iptables rules on the compute nodes for the return
traffic.


On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer 
wrote:

> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
> eventually come up with no addresses. neutron-dhcp-agent has been restarted
> on both controllers. The qdhcp netns's were all present; I stopped the
> service, removed the qdhcp netns's, noted the dhcp agents show offline by
> `neutron agent-list`, restarted all neutron services, noted the qdhcp
> netns's were recreated, restarted a VM again and it still fails to pull an
> IP address.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 10:38 AM
> *To*: torin.wolt...@granddial.com
> *Subject*: Re: [Openstack] Recovering from full outage
> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>
> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> The qrouter netns appears once the lock_path is specified, the neutron
>> router is pingable as well. However, instances are not pingable. If I log
>> in via console, the instances have not been given IP addresses, if I
>> manually give them an address and route they are pingable and seem to work.
>> So the router is working correctly but dhcp is not working.
>>
>> No errors in any of the neutron or nova logs on controllers or compute
>> nodes.
>>
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> * <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/5/18 8:53 AM
>> *To*: 
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>> There is no lock path set in my neutron configuration. Does it ultimately
>> matter what it is set to as long as it is consistent? Does it need to be
>> set on compute nodes as well as controllers?
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> * <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/3/18 7:47 PM
>> *To*: torin.wolt...@granddial.com
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>>
>> Did you set a lock_path in the neutron’s config?
>>
>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>> wrote:
>>
>> The following errors appear in the neutron-linuxbridge-agent.log on both
>> controllers: <http://paste.openstack.org/show/724930/>
>> <http://paste.openstack.org/show/724930/>
>> <http://paste.openstack.org/show/724930/>
>> <http://paste.openstack.org/show/724930/>
>> <http://paste.openstack.org/show/724930/>
>> <http://paste.openstack.org/show/724930/>http://paste.openstack.org/sho
>> w/724930/
>>
>> No such errors are on the compute nodes themselves.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> * <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com> <http://www.granddial.com>
>> <http://www.granddial.com>www.granddial.com <http://www.granddial.com>*
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/3/18 5:14 PM
>> *To*: 
>> *Cc*: "openstack-operators@lists.openstack.org" <
>> openstack-operators@lists.openstack.org>, "openst...@lists.openstack.org"
>> 
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Running `openstack server reboot` on an instance just causes the instance
>> t

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-03 Thread George Mihaiescu
Did you set a lock_path in the neutron’s config?

> On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:
> 
> The following errors appear in the neutron-linuxbridge-agent.log on both 
> controllers: http://paste.openstack.org/show/724930/
> 
> No such errors are on the compute nodes themselves.
> 
> Torin Woltjer
>  
> Grand Dial Communications - A ZK Tech Inc. Company
>  
> 616.776.1066 ext. 2006
> www.granddial.com
> 
> From: "Torin Woltjer" 
> Sent: 7/3/18 5:14 PM
> To: 
> Cc: "openstack-operators@lists.openstack.org" 
> , "openst...@lists.openstack.org" 
> 
> Subject: Re: [Openstack] Recovering from full outage
> Running `openstack server reboot` on an instance just causes the instance to 
> be stuck in a rebooting status. Most notable of the logs is 
> neutron-server.log which shows the following:
> http://paste.openstack.org/show/724917/
> 
> I realized that rabbitmq was in a failed state, so I bootstrapped it, 
> rebooted controllers, and all of the agents show online.
> http://paste.openstack.org/show/724921/
> And all of the instances can be properly started, however I cannot ping any 
> of the instances floating IPs or the neutron router. And when logging into an 
> instance with the console, there is no IP address on any interface.
> 
> Torin Woltjer
>  
> Grand Dial Communications - A ZK Tech Inc. Company
>  
> 616.776.1066 ext. 2006
> www.granddial.com
> 
> From: George Mihaiescu 
> Sent: 7/3/18 11:50 AM
> To: torin.wolt...@granddial.com
> Subject: Re: [Openstack] Recovering from full outage
> Try restarting them using "openstack server reboot" and also check the 
> nova-compute.log and neutron agents logs on the compute nodes.
> 
>> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
>> wrote:
>> We just suffered a power outage in out data center and I'm having trouble 
>> recovering the Openstack cluster. All of the nodes are back online, every 
>> instance shows active but `virsh list --all` on the compute nodes show that 
>> all of the VMs are actually shut down. Running `ip addr` on any of the nodes 
>> shows that none of the bridges are present and `ip netns` shows that all of 
>> the network namespaces are missing as well. So despite all of the neutron 
>> service running, none of the networking appears to be active, which is 
>> concerning. How do I solve this without recreating all of the networks?
>> 
>> Torin Woltjer
>>  
>> Grand Dial Communications - A ZK Tech Inc. Company
>>  
>> 616.776.1066 ext. 2006
>> www.granddial.com
>> 
>> ___
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openst...@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> 
> 
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread George Mihaiescu
We have other scheduled tests that perform end-to-end (assign floating IP,
ssh, ping outside) and never had an issue.
I think we turned it off because the callback code was initially buggy and
nova would wait forever while things were in fact ok, but I'll  change
"vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run
another large test, just to confirm.

We usually run these large tests after a version upgrade to test the APIs
under load.



On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann <mriede...@gmail.com>
wrote:

> On 5/17/2018 9:46 AM, George Mihaiescu wrote:
>
>> and large rally tests of 500 instances complete with no issues.
>>
>
> Sure, except you can't ssh into the guests.
>
> The whole reason the vif plugging is fatal and timeout and callback code
> was because the upstream CI was unstable without it. The server would
> report as ACTIVE but the ports weren't wired up so ssh would fail. Having
> an ACTIVE guest that you can't actually do anything with is kind of
> pointless.
>
> --
>
> Thanks,
>
> Matt
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] attaching network cards to VMs taking a very long time

2018-05-17 Thread George Mihaiescu
We use "vif_plugging_is_fatal = False" and "vif_plugging_timeout = 0" as
well as "no-ping" in the dnsmasq-neutron.conf, and large rally tests of 500
instances complete with no issues.

These are some good blogposts about Neutron performance:
https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/
https://www.mirantis.com/blog/improving-dhcp-performance-openstack/

I would run a large rally test like this one and see where time is spent
mostly:
{
"NovaServers.boot_and_delete_server": [
{
"args": {
"flavor": {
"name": "c2.small"
},
"image": {
"name": "^Ubuntu 16.04 - latest$"
},
"force_delete": false
},
"runner": {
"type": "constant",
"times": 500,
"concurrency": 100
}
}
]
}


Cheers,
George

On Thu, May 17, 2018 at 7:49 AM, Radu Popescu | eMAG, Technology <
radu.pope...@emag.ro> wrote:

> Hi,
>
> unfortunately, didn't get the reply in my inbox, so I'm answering from the
> link here:
> http://lists.openstack.org/pipermail/openstack-operators/
> 2018-May/015270.html
> (hopefully, my reply will go to the same thread)
>
> Anyway, I can see the neutron openvswitch agent logs processing the
> interface way after the VM is up (in this case, 30 minutes). And after the
> vif plugin timeout of 5 minutes (currently 10 minutes).
> After searching for logs, I came out with an example here: (replaced nova
> compute hostname with "nova.compute.hostname")
>
> http://paste.openstack.org/show/1VevKuimoBMs4G8X53Eu/
>
> As you can see, the request for the VM starts around 3:27AM. Ports get
> created, openvswitch has the command to do it, has DHCP, but apparently
> Neutron server sends the callback after Neutron Openvswitch agent finishes.
> Callback is at 2018-05-10 03:57:36.177 while Neutron Openvswitch agent says
> it completed the setup and configuration at 2018-05-10 03:57:35.247.
>
> So, my question is, why is Neutron Openvswitch agent processing the
> request 30 minutes after the VM is started? And where can I search for logs
> for whatever happens during those 30 minutes?
> And yes, we're using libvirt. At some point, we added some new nova
> compute nodes and the new ones came with v3.2.0 and was breaking migration
> between hosts. That's why we downgraded (and versionlocked) everything at
> v2.0.0.
>
> Thanks,
> Radu
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Guest crash and KVM unhandled rdmsr

2017-10-17 Thread George Mihaiescu
Hi Blair,

We had a few cases of compute nodes hanging with the last log in syslog
being related to "rdmsr", and requiring hard reboots:
 kvm [29216]: vcpu0 unhandled rdmsr: 0x345

The workloads are probably similar to yours (SGE workers doing genomics)
with CPU mode host-passthrough, on top of Ubuntu 16.04 and kernel
4.4.0-96-generic.

I'm not sure the "rdmsr" logs are relevant though, because we see them on
other  compute nodes that have no issues.

Did you find anything that might indicate what the root cause is?

Cheers,
George


On Thu, Oct 12, 2017 at 5:26 PM, Blair Bethwaite 
wrote:

> Hi all,
>
> Has anyone seen guest crashes/freezes associated with KVM unhandled rdmsr
> messages in dmesg on the hypervisor?
>
> We have seen these messages before but never with a strong correlation to
> guest problems. However over the past couple of weeks this is happening
> almost daily with consistent correlation for a set of hosts dedicated to a
> particular HPC workload. So far as I know the workload has not changed, but
> we have just recently moved the hypervisors to Ubuntu Xenial (though they
> were already on the Xenial kernel previously) and done minor guest
> (CentOS7) updates. CPU mode is host-passthrough. Currently trying to figure
> out if the CPU flags in the guest have changed since the host upgrade...
>
> Cheers,
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Experience with Cinder volumes as root disks?

2017-08-02 Thread George Mihaiescu
I totally agree with Jay, this is the best, cheapest and most scalable way to 
build a cloud environment with Openstack.

We use local storage as the primary root disk source which lets us make good 
use of the slots available in each compute node (6), and coupled with the 
Raid10 gives good I/O performance.

We also have a multi petabyte Ceph cluster that we use to store large genomics 
files in object format, as well as backend for Cinder volumes, but the primary 
use case for the Ceph cluster is not booting up the instances.

In this way, we have small failure domains, and if a VM does a lot of IO it 
only impacts a few other neighbours. The latency for writes is low, and we 
don't spend money (and drive slots) on SSD journals improving write latency 
only until the Ceph journal needs to flush.

Speed of provisioning is not a concern because anyway with a small image 
library, most of the popular ones are already cached on the compute nodes, and 
the time it takes for the instance to boot is just a small percentage of the 
total instance runtime (days or weeks).

The drawback is that maintenances requiring reboots need to be scheduled in 
advance, but I would argue that booting from a shared storage and having to 
orchestrate the live migration of 1000 instances from 100 compute nodes without 
performance impact for the workloads running there (some migrations could fail 
because intense CPU or memory activity) is not very feasible either...

George 



> On Aug 1, 2017, at 11:59, Jay Pipes  wrote:
> 
>> On 08/01/2017 11:14 AM, John Petrini wrote:
>> Just my two cents here but we started out using mostly Ephemeral storage in 
>> our builds and looking back I wish we hadn't. Note we're using Ceph as a 
>> backend so my response is tailored towards Ceph's behavior.
>> The major pain point is snapshots. When you snapshot an nova volume an RBD 
>> snapshot occurs and is very quick and uses very little additional storage, 
>> however the snapshot is then copied into the images pool and in the process 
>> is converted from a snapshot to a full size image. This takes a long time 
>> because you have to copy a lot of data and it takes up a lot of space. It 
>> also causes a great deal of IO on the storage and means you end up with a 
>> bunch of "snapshot images" creating clutter. On the other hand volume 
>> snapshots are near instantaneous without the other drawbacks I've mentioned.
>> On the plus side for ephemeral storage; resizing the root disk of images 
>> works better. As long as your image is configured properly it's just a 
>> matter of initiating a resize and letting the instance reboot to grow the 
>> root disk. When using volumes as your root disk you instead have to shutdown 
>> the instance, grow the volume and boot.
>> I hope this help! If anyone on the list knows something I don't know 
>> regarding these issues please chime in. I'd love to know if there's a better 
>> way.
> 
> I'd just like to point out that the above is exactly the right way to think 
> about things.
> 
> Don't boot from volume (i.e. don't use a volume as your root disk).
> 
> Instead, separate the operating system from your application data. Put the 
> operating system on a small disk image (small == fast boot times), use a 
> config drive for injectable configuration and create Cinder volumes for your 
> application data.
> 
> Detach and attach the application data Cinder volume as needed to your server 
> instance. Make your life easier by not coupling application data and the 
> operating system together.
> 
> Best,
> -jay
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [scientific] Lightning talks on Scientific OpenStack

2017-04-28 Thread George Mihaiescu
Thanks Stig,

I added a presentation to the schedule.


Cheers,
George



On Thu, Apr 27, 2017 at 3:49 PM, Stig Telfer <stig.openst...@telfer.org>
wrote:

> Hi George -
>
> Sorry for the slow response.  The consensus was for 8 minutes maximum.
> That should be plenty for a lightning talk, and enables us to fit one more
> in.
>
> Best wishes,
> Stig
>
>
> > On 27 Apr 2017, at 20:29, George Mihaiescu <lmihaie...@gmail.com> wrote:
> >
> > Hi Stig, it will be 10 minutes sessions like in Barcelona?
> >
> > Thanks,
> > George
> >
> >> On Apr 26, 2017, at 03:31, Stig Telfer <stig.openst...@telfer.org>
> wrote:
> >>
> >> Hi All -
> >>
> >> We have planned a session of lightning talks at the Boston summit to
> discuss topics specific for OpenStack and research computing applications.
> This was a great success at Barcelona and generated some stimulating
> discussion.  We are also hoping for a small prize for the best talk of the
> session!
> >>
> >> This is the event:
> >> https://www.openstack.org/summit/boston-2017/summit-
> schedule/events/18676
> >>
> >> If you’d like to propose a talk, please add a title and your name here:
> >> https://etherpad.openstack.org/p/Scientific-WG-boston
> >>
> >> Everyone is welcome.
> >>
> >> Cheers,
> >> Stig
> >>
> >>
> >> ___
> >> OpenStack-operators mailing list
> >> OpenStack-operators@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [scientific] Lightning talks on Scientific OpenStack

2017-04-27 Thread George Mihaiescu
Hi Stig, it will be 10 minutes sessions like in Barcelona?

Thanks,
George 

> On Apr 26, 2017, at 03:31, Stig Telfer  wrote:
> 
> Hi All - 
> 
> We have planned a session of lightning talks at the Boston summit to discuss 
> topics specific for OpenStack and research computing applications.  This was 
> a great success at Barcelona and generated some stimulating discussion.  We 
> are also hoping for a small prize for the best talk of the session!
> 
> This is the event:
> https://www.openstack.org/summit/boston-2017/summit-schedule/events/18676
> 
> If you’d like to propose a talk, please add a title and your name here:
> https://etherpad.openstack.org/p/Scientific-WG-boston
> 
> Everyone is welcome.
> 
> Cheers,
> Stig
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Mixed env for nova (ceph for some compute nodes, local disk for the rest): qcow2 or raw images ?

2017-04-05 Thread George Mihaiescu
Hi Massimo,

You can upload the images twice, in both qcow2 and raw format, then create
a host aggregate for your "local-disk" compute nodes and set its metadata
to match the property you'll set on your qcow2 images.

When somebody will start a qcow2 version of the image, it will be scheduled
on your compute nodes with local disk and pull the qcow2 image from Glance.

Does it make sense?

George

On Wed, Apr 5, 2017 at 10:05 AM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi
>
> Currently in our Cloud we are using a gluster storage for cinder and
> glance.
> For nova we are using a shared file system (implemented using gluster) for
> part of the compute nodes; the rest of the compute nodes use the local disk.
>
> We are now planning the replacement of gluster with ceph. The idea is
> therefore to use ceph for cinder, glance. Ceph would be used for nova but
> just for a set of compute nodes  (the other compute nodes would keep using
> the local disk).
>
> In such configuration I see a problem with the choice of the best format
> type
> for images.
>
> As far as I understand (please correct me if am wrong) the ideal setup
> would be using raw images for VMs targeted to compute nodes using ceph, and
> qcow2 images for VMs targeted to compute nodes using the local disk for
> nova.
> In fact starting a VM using a qcow2 image on a compute node using ceph for
> nova works but it is quite inefficient since the qcow2 image must be first
> downloaded in /var/lib/nova/instances/_base and then converted into raw.
> This also means that some space is needed on the local disk.
>
> And if you start a VM using a raw image on a a compute node using the
> local disk for nova, the raw image (usually quite big) must be downloaded
> on the compute node, and this is less efficient wrt a qcow2 image. It is
> true that the qcow2 is then converted into raw, but I think that most of
> the time is taken in downloading the image.
>
> Did I get it right ?
> Any advice ?
>
> Thanks, Massimo
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [BostonSummit] [Forum] seeking Boston Summit Forum group to discuss protected data cloud

2017-03-06 Thread George Mihaiescu
Hi Evan,

I believe the scientific working group will have at least a meeting focused
on this subject (
https://etherpad.openstack.org/p/BOS-UC-brainstorming-scientific-wg)

Contact me off-list if you want to chat about protected data, as I'm the
architect for a fairly large environment that deals exactly with this issue.

Cheers,
George


On Mon, Mar 6, 2017 at 2:23 PM, Evan Bollig PhD  wrote:

> Looking for a Workgroup or BoF at the Boston Summit where we can
> discuss the design and use of a Ceph-backed OpenStack cloud
> environment for research computing with protected data (e.g., NIH GDS,
> HIPAA, FedRAMP, etc.).
>
> If anyone thinks this is a fit for their group, let me know.
>
> Cheers,
> -E
> --
> Evan F. Bollig, PhD
> Scientific Computing Consultant, Application Developer | Scientific
> Computing Solutions (SCS)
> Minnesota Supercomputing Institute | msi.umn.edu
> University of Minnesota | umn.edu
> boll0...@umn.edu | 612-624-1447 | Walter Lib Rm 556
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Change 'swap' in a flavor template

2016-12-15 Thread George Mihaiescu
Can you not update the flavour in dashboard?

> On Dec 15, 2016, at 09:34, William Josefsson  
> wrote:
> 
>> On Thu, Dec 15, 2016 at 9:40 PM, Mikhail Medvedev  
>> wrote:
>> 
>> I could not figure out how to set swap on existing flavor fast enough,
>> so I initially edited nova db directly. There were no side effects in
>> doing so in Icehouse. I see no reason it would not work in Liberty.
>> 
>>> Can anyone please advice on how to go about changing the 'swap'
>>> setting for an existing flavor? Last resort is to add additional
>>> flavors with swap values, but that would be very ugly. :(
>> 
>> For a "nicer" way I ended up recreating flavor I needed to edit:
>> delete old one, create new one with the same id and swap enabled. I
>> hope there is a better way, but editing db directly, or recreating
>> flavor was sufficient for me so far.
> 
> Thanks Mikhail. Appreciate the hint. I thought of deleting the flavor,
> and add again but was concerned about if that would affect current
> instances with that flavor-id in use? Maybe the easiest is to just go
> ahead and update the 'nova' table. I just was concerned that there
> would be existing relationships that would brake upon e.g. deleting
> existing instances, however.. I think I should go ahead and try the
> db-update way first. thanks! will
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

2016-11-30 Thread George Mihaiescu
Try changing the following in nova.conf and restart the nova-scheduler:

scheduler_host_subset_size = 10
scheduler_max_attempts = 10

Cheers,
George

On Wed, Nov 30, 2016 at 9:56 AM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi all
>
> I have a problem with scheduling in our Mitaka Cloud,
> Basically when there are a lot of requests for new instances, some of them
> fail because "Failed to compute_task_build_instances: Exceeded maximum
> number of retries". And the failures are because "Insufficient compute
> resources: Free memory 2879.50 MB < requested
>  8192 MB" [*]
>
> But there are compute nodes with enough memory that could serve such
> requests.
>
> In the conductor log I also see messages reporting that "Function
> 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
> interval by xxx sec" [**]
>
>
> My understanding is that:
>
> - VM a is scheduled to a certain compute node
> - the scheduler chooses the same compute node for VM b before the info for
> that compute node is updated (so the 'size' of VM a is not taken into
> account)
>
> Does this make sense or am I totally wrong ?
>
> Any hints about how to cope with such scenarios, besides increasing
>  scheduler_max_attempts ?
>
> scheduler_default_filters is set to:
>
> scheduler_default_filters = AggregateInstanceExtraSpecsFilter,
> AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,
> RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,ComputeFilter,
> ComputeCapabilitiesFilter,ImagePropertiesFilter,
> ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
>
>
> Thanks a lot, Massimo
>
> [*]
>
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
> of retries. Exceeded max scheduling attempts 5 for instance
> 314eccd0-fc73-446f-8138-7d8d3c
> 8644f7. Last exception: Insufficient compute resources: Free memory
> 2879.50 MB < requested 8192 MB.
> 2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
> [req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
> d27fe2becea94a3e980fb9f66e2f29
> 1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting
> instance to ERROR state.
>
>
> [**]
>
> 2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.08 sec
> 2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.33 sec
> 2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.32 sec
> 2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.30 sec
> 2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.28 sec
> 2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.24 sec
> 2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 9.11 sec
> 2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 3.09 sec
> 2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.12 sec
> 2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.13 sec
> 2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.14 sec
> 2016-11-30 15:11:05.526 25140 WARNING oslo.service.loopingcall [-]
> Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run
> outlasted interval by 1.15 sec
> 2016-11-30 15:11:05.529 25129 WARNING oslo.service.loopingcall 

Re: [Openstack-operators] Audit Logging - Interested? What's missing?

2016-11-17 Thread George Mihaiescu
Same need here, I want to know who changed a security group and what change was 
done. Just the logged POST on the API is not enough to properly audit the 
operation.

> On Nov 16, 2016, at 19:51, Kris G. Lindgren  wrote:
> 
> I need to do a deeper dive on audit logging. 
> 
> However, we have a requirement for when someone changes a security group that 
> we log what the previous security group was and what the new security group 
> is and who changed it.  I don’’t know if this is specific to our crazy 
> security people or if others security peoples want to have this.  I am sure I 
> can think of others.
> 
> 
> ___
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy
> 
> On 11/16/16, 3:29 PM, "Tom Fifield"  wrote:
> 
>Hi Ops,
> 
>Was chatting with Department of Defense in Australia the other day, and 
>one of their pain points is Audit Logging. Some bits of OpenStack just 
>don't leave enough information for proper audit. So, thought it might be 
>a good idea to gather people who are interested to brainstorm how to get 
>it to a good level for all :)
> 
>Does your cloud need good audit logging? What do you wish was there at 
>the moment, but isn't?
> 
> 
>Regards,
> 
> 
>Tom
> 
>___
>OpenStack-operators mailing list
>OpenStack-operators@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] ML2/OVS odd GRE brokenness

2016-11-08 Thread George Mihaiescu
Hi Jonathan,

The openvswitch-agent is out of sync on compute 4, try restarting it.



> On Nov 8, 2016, at 17:43, Jonathan Proulx  wrote:
> 
> 
> I have an odd issue that seems to just be affecting one private
> network for one tenant, though I saw a similar thing on a different
> project network recently which I 'fixed' by rebooting the hypervisor.
> Since this has now (maybe) happened twice I figure I should try to
> understand what it is.
> 
> Given the following four VMs on 4 different hypervisors
> 
> vm1 on Hypervisor1
> vm2 on Hypervisor2
> vm3 on Hypervisor3
> ---
> vm4 on Hypervisor4
> 
> 
> vm1 -> vm3 talk fine among themselves but none to 4
> 
> examining ping traffic transiting from vm1-vm4 I can see arp requests
> and responses at vm4 and GRE encapsulated ARP responses on
> Hypervisor1's physical interface.
> 
> They look the same to me (same ecap id) coming in as the working vms
> traffic, but they never make it to the qvo device which is before
> iptables sec_group rules are applied at the tap device.
> 
> attempting to tare down and recreate this resuls in the same first 3
> work last one doesn't split (possibly becuase scheduler puts them in
> the same place? haven't checked) 
> 
> ovs-vsctl -- set Bridge br-int mirrors=@m  -- --id=@snooper2 get Port 
> snooper2  -- --id=@gre-801e0347 get Port gre-801e0347 -- --id=@m create 
> Mirror name=mymirror select-dst-port=@gre-801e0347 
> select-src-port=@gre-801e0347 output-port=@snooper2
> 
> tcpdump -i snooper2 
> 
> Only sees ARP requests but no response, what's broken if I can see GRE
> encap ARP responses on physical interface but not on gre-
> interface?  And why is it not broken for all tunnels endpoints?
> 
> Oddly if I boot a 5th VM on a 5th hypervisor it can talk to 4 but not 1-3 ...
> 
> hypervisors are Ubuntu 14.04 running Mitaka from cloud archive w/
> xenial-lts kernels (4.4.0)
> 
> -Jon
> 
> -- 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Disable console for an instance

2016-10-27 Thread George Mihaiescu
You're right, it's probably the following you would want changed:

"compute:get_vnc_console": "",
"compute:get_spice_console": "",
"compute:get_rdp_console": "",
"compute:get_serial_console": "",
"compute:get_mks_console": "",
"compute:get_console_output": "",

I thought the use case is to limit console access to users in a shared
project environment, where you might have multiple users seeing each other
instances, and you don't want them to try logging on the console.

You could create a special role that has console access and change the
policy file to reference that role for the "compute:get_vnc_console", for
example.

I don't think you can do it on per-flavor basis.

Cheers,
George

On Thu, Oct 27, 2016 at 10:24 AM, Blair Bethwaite <blair.bethwa...@gmail.com
> wrote:

> Hi George,
>
> On 27 October 2016 at 16:15, George Mihaiescu <lmihaie...@gmail.com>
> wrote:
> > Did you try playing with Nova's policy file and limit the scope for
> > "compute_extension:console_output": "" ?
>
> No, interesting idea though... I suspect it's actually the
> get_*_console policies we'd need to tweak, I think console_output
> probably refers to the console log? Anyway, not quite sure how we'd
> craft policy that would enable us to disable these on a per instance
> basis though - is it possible to reference image metadata in the
> context of the policy rule?
>
> --
> Cheers,
> ~Blairo
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Disable console for an instance

2016-10-27 Thread George Mihaiescu
Hi Blair,

Did you try playing with Nova's policy file and limit the scope for
"compute_extension:console_output": "" ?

Cheers,
George

On Thu, Oct 27, 2016 at 10:08 AM, Blair Bethwaite  wrote:

> On 27 October 2016 at 16:02, Jonathan D. Proulx  wrote:
> > don't put a getty on the TTY :)
>
> Do you know how to do that with Windows? ...you can see the desire for
> sandboxing now :-).
>
> --
> Cheers,
> ~Blairo
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators