Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On 10/27/2017 1:23 PM, Matt Riedemann wrote: Nova has had this long-standing known performance issue if you're filtering a large number of instances by IP. The instance IPs are stored in a JSON blob in the database so we don't do filtering in SQL. We pull the instances out of the database, deserialize the JSON and then apply a regex filter match in the nova-api python code. At the Queens PTG we talked about possible ways to fix this and came up with this nova spec: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html The idea is to have nova get ports from neutron and apply the IP filter in neutron to whittle down the ports, then from that list of ports get the instances to pull out of the nova database. One issue that has come up with this is neutron does not currently support regex filters when listing ports. There is an RFE for adding that: https://bugs.launchpad.net/neutron/+bug/1718605 The proposed neutron implementation is to just do SQL LIKE substring matching in the database. However, one issue that has come up is that the compute API accepts a python regex filter and uses re.match(): https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469 At least one good thing about that is match() only matches from the beginning of the string unlike search(). So for example I can filter on "192.16.*[1-5]$" if I wanted to, but that's not going to work with just a LIKE substring filter in SQL. The question is, does anyone actually do more than basic substring matching with the IP filter today? Because if we started using neutron, that behavior would be broken. We've never actually documented the match restrictions on the IP filter, but that's not a good reason to break it. One option is to make this configurable such that deployments which rely on the complicated pattern matching can just use the existing nova code despite performance issues. However, that's not interoperable, I hate config-driven API behavior, and it would mean maintaining two code paths in nova, which is also terrible. I was trying to think of a way to determine if the IP filter passed to nova is basic or a complicated pattern match and let us decide that way, but I'm not sure if there are good ways to detect that - maybe by simply looking for special characters like (, ), - and $? But then there is [] and we have an IPv6 filter, so that gets messy too... For now I'd just like to know if people rely on the regex match or not. Other ideas on how to handle this are appreciated. To paraphrase the nova queens roadmap and checkpoint session at the summit [1] we agreed to just do LIKE in-SQL regex filtering in Neutron. The operators in the room (and from this thread in the ML) have said they are doing exact IP filter matches, not regex. The LIKE regex filtering in SQL still gives some flexibility with regex, but not the crazy python re thing nova allows today (which is potentially an attack vector). So we'll move forward with those changes. [1] https://etherpad.openstack.org/p/SYD-forum-nova-queens-update -- Thanks, Matt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
FYI, Nova did use regex https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2408 2017-10-27 11:35 GMT+08:00 Matt Riedemann: > On 10/26/2017 9:54 PM, Tony Breeds wrote: > >> Can you use RLIKE/REGEX? or is that too MySQL specific ? >> > > I thought about that, and my gut response is 'no' because even if it does > work for mysql, I'm assuming regex pattern matching for postgresql is > different. And then you have different API behavior between clouds based on > the backend database they are using, and now we've opened that whole can of > worms again. > > > -- > > Thanks, > > Matt > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
Matt Riedemann wrote: On 10/26/2017 10:56 PM, Joshua Harlow wrote: Just the paranoid person in me, but is it safe to say that the filter that you are showing here does not come from user text? Ie these two lines don't come from a user input directly (without going through some filter) do they? https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459 From reading it seems like perhaps they do come at least partially from a user, so I am hoping that its not possible for a user to present a 'ip' that is really a complicated regex that takes a long time to compile (and therefore can DOS the nova-api component); but I don't know the surrounding code so I might be wrong... Just wondering :-/ -Josh We have schema validation on the ip filter but it's just checking that it can actually compile it: https://github.com/openstack/nova/blob/16.0.0/nova/api/validation/validators.py#L35 So yeah, probably a potential problem like you pointed out. Ya, would seem so, especially if large user strings can get compiled. Just a reference/useful tidbit but in the `re.py` module there is a cache of the last 512 patterns compiled (suprise! i don't think a lot of people know about it, ha), so assuming that users can present arbitrary (and/or pretty big) input to the REST api of nova then that cache could pretty large (depending on the allowable request max size) and/or could also be thrashed pretty quickly (also note that regex compiling jumps into C code afaik, so that probably locks up other greenthreads). The cache layer fyi: https://github.com/python/cpython/blob/3.6/Lib/re.py#L281-L312 Just a thought but it might just be a good idea to remove this validator and never again do user provided regex patterns/input and such in general (to avoid cache thrashing and various other ReDoS or ReDoS-like problems). -Josh __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On 10/26/2017 10:56 PM, Joshua Harlow wrote: Just the paranoid person in me, but is it safe to say that the filter that you are showing here does not come from user text? Ie these two lines don't come from a user input directly (without going through some filter) do they? https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459 From reading it seems like perhaps they do come at least partially from a user, so I am hoping that its not possible for a user to present a 'ip' that is really a complicated regex that takes a long time to compile (and therefore can DOS the nova-api component); but I don't know the surrounding code so I might be wrong... Just wondering :-/ -Josh We have schema validation on the ip filter but it's just checking that it can actually compile it: https://github.com/openstack/nova/blob/16.0.0/nova/api/validation/validators.py#L35 So yeah, probably a potential problem like you pointed out. -- Thanks, Matt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On 2017-10-27 14:26:06 -0400 (-0400), Mohammed Naser wrote: [...] > in our experience, malicious VMs are not short lived but they are > long lived. We'll generally find them running before we received > the report which means that the abuse report came for that user > indeed. [...] I guess the Infra use case where we cycle through thousands of instances an hour just winds up being a magnet for misdirected complaints in those less-frequent incidents where the abused instance was already deleted, as we have a higher chance of being the most recent tenant reassigned the offending IP address. -- Jeremy Stanley signature.asc Description: Digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On Fri, Oct 27, 2017 at 12:48 PM, Jeremy Stanleywrote: > On 2017-10-26 22:26:59 -0400 (-0400), Mohammed Naser wrote: > [...] > > The use-case for us is that it helps us easily identify or find VMs which > > we get any abuse reports for (or anything we see malicious traffic going > > to/from). We usually search for an *exact* match of the IP address as we > > are simply trying to perform a lookup of instance ID based on the IP > > address. Regex matching isn't important in our case. > [...] > > Does it allow you to identify which instance had that IP address > over a specific timeframe? One problem we encounter is that we get > abuse reports forwarded from our service providers telling us that > our instance with some particular IP address was performing port > scans or participating in a denial of service attack, and invariably > when we check our logs we did not have an instance with that IP > address at the timeframe indicated by the original abuse reporter > (we had an instance with that IP address at some point for an hour > or two maybe, but not until days later when the abuse team went > checking to see who was responsible, and yet they tend to just > assume everyone has long-lived instances and that IP addresses don't > bounce around from tenant to tenant with great frequency). > Unfortunately, it does not, which means if the VM is gone, it is much harder at finding the exact source of the abuse at the time. However, generally, in our experience, malicious VMs are not short lived but they are long lived. We'll generally find them running before we received the report which means that the abuse report came for that user indeed. The other nice thing which I noticed is that Neutron generally doesn't re-use IPs until it cycles the entire subnet/CIDR, so if you have a large number of IPs and you don't have a big churn in VMs, it's unlikely that a VM will get the same IP in a short period of time. > It seems like OpenStack could generally benefit from a mechanism for > correlating abuse complaints to specific instances/tenants in a way > that allows performing time-based lookups as well. Compute instances > are ephemeral, so treating abuse complaints the same as you would in > a dedicated hosting environment doesn't really work so well. > -- > Jeremy Stanley > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
Just the paranoid person in me, but is it safe to say that the filter that you are showing here does not come from user text? Ie these two lines don't come from a user input directly (without going through some filter) do they? https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2458-L2459 From reading it seems like perhaps they do come at least partially from a user, so I am hoping that its not possible for a user to present a 'ip' that is really a complicated regex that takes a long time to compile (and therefore can DOS the nova-api component); but I don't know the surrounding code so I might be wrong... Just wondering :-/ -Josh Matt Riedemann wrote: Nova has had this long-standing known performance issue if you're filtering a large number of instances by IP. The instance IPs are stored in a JSON blob in the database so we don't do filtering in SQL. We pull the instances out of the database, deserialize the JSON and then apply a regex filter match in the nova-api python code. At the Queens PTG we talked about possible ways to fix this and came up with this nova spec: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html The idea is to have nova get ports from neutron and apply the IP filter in neutron to whittle down the ports, then from that list of ports get the instances to pull out of the nova database. One issue that has come up with this is neutron does not currently support regex filters when listing ports. There is an RFE for adding that: https://bugs.launchpad.net/neutron/+bug/1718605 The proposed neutron implementation is to just do SQL LIKE substring matching in the database. However, one issue that has come up is that the compute API accepts a python regex filter and uses re.match(): https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469 At least one good thing about that is match() only matches from the beginning of the string unlike search(). So for example I can filter on "192.16.*[1-5]$" if I wanted to, but that's not going to work with just a LIKE substring filter in SQL. The question is, does anyone actually do more than basic substring matching with the IP filter today? Because if we started using neutron, that behavior would be broken. We've never actually documented the match restrictions on the IP filter, but that's not a good reason to break it. One option is to make this configurable such that deployments which rely on the complicated pattern matching can just use the existing nova code despite performance issues. However, that's not interoperable, I hate config-driven API behavior, and it would mean maintaining two code paths in nova, which is also terrible. I was trying to think of a way to determine if the IP filter passed to nova is basic or a complicated pattern match and let us decide that way, but I'm not sure if there are good ways to detect that - maybe by simply looking for special characters like (, ), - and $? But then there is [] and we have an IPv6 filter, so that gets messy too... For now I'd just like to know if people rely on the regex match or not. Other ideas on how to handle this are appreciated. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On Thu, Oct 26, 2017 at 10:35:47PM -0500, Matt Riedemann wrote: > On 10/26/2017 9:54 PM, Tony Breeds wrote: > > Can you use RLIKE/REGEX? or is that too MySQL specific ? > > I thought about that, and my gut response is 'no' because even if it does > work for mysql, I'm assuming regex pattern matching for postgresql is > different. And then you have different API behavior between clouds based on > the backend database they are using, and now we've opened that whole can of > worms again. Yeah: column.op('rlike')() # Mysql column.op('~')() # Pgsql I have no idea if the regexs themselves would be compatible and of course there are other RDBMSs Yours Tony. signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On 10/26/2017 9:54 PM, Tony Breeds wrote: Can you use RLIKE/REGEX? or is that too MySQL specific ? I thought about that, and my gut response is 'no' because even if it does work for mysql, I'm assuming regex pattern matching for postgresql is different. And then you have different API behavior between clouds based on the backend database they are using, and now we've opened that whole can of worms again. -- Thanks, Matt __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On Thu, Oct 26, 2017 at 09:23:50PM -0500, Matt Riedemann wrote: > Nova has had this long-standing known performance issue if you're filtering > a large number of instances by IP. The instance IPs are stored in a JSON > blob in the database so we don't do filtering in SQL. We pull the instances > out of the database, deserialize the JSON and then apply a regex filter > match in the nova-api python code. > > At the Queens PTG we talked about possible ways to fix this and came up with > this nova spec: > > https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/improve-filter-instances-by-ip-performance.html > > The idea is to have nova get ports from neutron and apply the IP filter in > neutron to whittle down the ports, then from that list of ports get the > instances to pull out of the nova database. > > One issue that has come up with this is neutron does not currently support > regex filters when listing ports. There is an RFE for adding that: > > https://bugs.launchpad.net/neutron/+bug/1718605 > > The proposed neutron implementation is to just do SQL LIKE substring > matching in the database. Can you use RLIKE/REGEX? or is that too MySQL specific ? Yours Tony. signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] How do you use the instance IP filter?
On Thu, Oct 26, 2017 at 10:23 PM, Matt Riedemannwrote: > Nova has had this long-standing known performance issue if you're > filtering a large number of instances by IP. The instance IPs are stored in > a JSON blob in the database so we don't do filtering in SQL. We pull the > instances out of the database, deserialize the JSON and then apply a regex > filter match in the nova-api python code. > > At the Queens PTG we talked about possible ways to fix this and came up > with this nova spec: > > https://specs.openstack.org/openstack/nova-specs/specs/queen > s/approved/improve-filter-instances-by-ip-performance.html > > The idea is to have nova get ports from neutron and apply the IP filter in > neutron to whittle down the ports, then from that list of ports get the > instances to pull out of the nova database. > > One issue that has come up with this is neutron does not currently support > regex filters when listing ports. There is an RFE for adding that: > > https://bugs.launchpad.net/neutron/+bug/1718605 > > The proposed neutron implementation is to just do SQL LIKE substring > matching in the database. > > However, one issue that has come up is that the compute API accepts a > python regex filter and uses re.match(): > > https://github.com/openstack/nova/blob/16.0.0/nova/compute/api.py#L2469 > > At least one good thing about that is match() only matches from the > beginning of the string unlike search(). > > So for example I can filter on "192.16.*[1-5]$" if I wanted to, but that's > not going to work with just a LIKE substring filter in SQL. > > The question is, does anyone actually do more than basic substring > matching with the IP filter today? Because if we started using neutron, > that behavior would be broken. We've never actually documented the match > restrictions on the IP filter, but that's not a good reason to break it. > The use-case for us is that it helps us easily identify or find VMs which we get any abuse reports for (or anything we see malicious traffic going to/from). We usually search for an *exact* match of the IP address as we are simply trying to perform a lookup of instance ID based on the IP address. Regex matching isn't important in our case. > One option is to make this configurable such that deployments which rely > on the complicated pattern matching can just use the existing nova code > despite performance issues. However, that's not interoperable, I hate > config-driven API behavior, and it would mean maintaining two code paths in > nova, which is also terrible. > > I was trying to think of a way to determine if the IP filter passed to > nova is basic or a complicated pattern match and let us decide that way, > but I'm not sure if there are good ways to detect that - maybe by simply > looking for special characters like (, ), - and $? But then there is [] and > we have an IPv6 filter, so that gets messy too... > > For now I'd just like to know if people rely on the regex match or not. > Other ideas on how to handle this are appreciated. > > -- > > Thanks, > > Matt > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev