Re: [PHP-DEV] HashDos protection
Le 01/12/2015 02:28, Anatol Belski a écrit : >From what I was testing, the configuration is not absolutely necessary. My first thinking, seeing this hardcoded limit was also, "we need to make it an new ini option". So I run a few tests. Especially the Anatol one (a bit optimized to get more unique keys faster), and with a debug message in zend_hash.c to display max number of collisions encountered during the run After some time... With around 9.2 millions of unique keys, and ~1.2GB of memory used, the max collisions never exceeds "12". So I really think, now, that we can go without new ini option. Remi. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
Hi Nikita, few notes: It looks like the patch messes the attempt of catching the problem (few lines related to zend_hash_find_bucket changes) with optimization to compensate collision check overhead (everything else). I think it's better to separate these parts. Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward incompatibility with 7.0.0. However, we may reserve it for internal usage removing ZEND_API. I don't think PHP should prevent user from construction of arrays he likes, because of internal problems (like in your example). > $s = 1 << 16; $a = []; > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } It makes performance problem, but it doesn't mean we should kill it. We have "max_execution_time" to catch them all. I think only big arrays coming from external sources should be checked. Your solution is incomplete, anyway, because of crafting a single list with 1 collisions, attacker will able to use N lists with 1000 collisions. Thanks. Dmitry. On Thu, Nov 26, 2015 at 8:24 PM, Nikita Popovwrote: > Hi internals! > > This mail turned out to be rather long, so I'll start with a TL;DR: > > To fix the HashDos vulnerability for *all* cases (rather than just GET/POST > parsing), I propose to introduce collision counting during hashtable > insertion operations. This will throw a fatal error if the number of > collisions during an insertion operation exceed a certain threshold. > > Implementation: https://github.com/php/php-src/pull/1565 > > From my testing the change has no negative performance impact. The change > is small and does not break ABI. > > Tracking bug (with various implementations): > https://bugs.php.net/bug.php?id=70644 > > What are your thoughts on this? > > Nikita > > - > > For those not familiar with HashDos or interested in some considerations > regarding alternative implementations, the original mail: > > In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in > the form of max_input_vars. As a recap, HashDos exploits the fact that > hashtables (which PHP uses ubiquitously) perform very badly if the hashes > of many elements collide. In particular the performance of inserting N > elements into a hashtable can degrade from O(N) to O(N^2) using carefully > chosen keys. > > A very simple demo script for creating a hashtable with many collisions: > > $s = 1 << 16; $a = []; > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } > > This code creates an array with about 65k elements and runs in 30s (PHP > 5.6) or 4s (PHP 7) on my machine. > > This behavior can be exploited by an attacker whenever a server running PHP > can be made to create a decent-sized array (or object) with > attacker-controlled keys. This DOS vulnerability is efficient: A 700kb > payload can easily take 5 CPU seconds to process on PHP 7 and from there it > goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc). > > The most obvious attack vector are GET and POST variables, which are > automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue > by introducing max_input_vars, which prevents creating GET/POST arrays with > more than a certain number of elements. As the HashDos attack relies on > O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix. > > However, GET/POST are not the only ways an attacker can trigger > array/object creation with specific keys. For example many PHP applications > have endpoints that accept JSON encoded payloads, which are not protected > by max_input_vars. The only means of protection currently available to > userland code is to never decode JSON payloads that exceed some > conservative size limit like 50-100KB (often not practical). > > Apart from JSON, there will likely be various other application-specific > situations where arrays are generated which contain user-provided keys. > While we could add something similar to max_input_vars to json_decode and > unserialize, this would not solve the problem for any custom code. > > There are essentially three ways to prevent HashDos attacks in general > (rather than patching specific places): > > 1. If there are many collisions for a hash slot, switch collision > resolution from linked lists to self-balancing binary trees. > 2. Switch hashtables to a keyed cryptographic hash like SipHash. > 3. If there are very many collisions for a hash slot, throw a fatal error. > > I will comment on these possibilities as they apply to PHP. > > 1. (Self-balancing binary trees). The idea here is that a balanced binary > tree has worst-case insertion time O(log N), while the linked list we > normally use has worst-case insertion time O(N). This means that the > worst-case execution time for N insertions into a hashtable goes from > O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than > usual". > > I don't think this solution is viable for PHP. Supporting binary trees next > to linked list
Re: [PHP-DEV] HashDos protection
On Tue, Dec 1, 2015 at 10:50 AM, Dmitry Stogovwrote: > Hi Nikita, > > few notes: > > It looks like the patch messes the attempt of catching the problem (few > lines related to zend_hash_find_bucket changes) with optimization to > compensate collision check overhead (everything else). I think it's better > to separate these parts. > The addition of zend_hash_add_or_return() isn't an optimization, it is required to ensure that the check happens in all relevant cases. The code previously used a combination of zend_hash_find() and zend_hash_add_new(). However the whole purpose of the zend_hash_add_new() function is to skip the part of the insertion operation where the collision count would be done. At this point I could either a) also count collisions in zend_hash_find(), which is imho not a good option as it's enough to count once and not at every lookup, or b) avoid the combination of zend_hash_find() and zend_hash_add_new() by introducing a new zend_hash_add_or_return() function (which does the same as the previous combination but also performs the collision count). Alternatively I would also simply make zend_hash_add_new() the same as zend_hash_add(), which is probably not wanted either. > Introduction of zend_hash_add_or_return() in 7.0.1 is going to make > forward incompatibility with 7.0.0. However, we may reserve it for internal > usage removing ZEND_API. > I'm fine with not marking it ZEND_API in 7.0. > I don't think PHP should prevent user from construction of arrays he > likes, because of internal problems (like in your example). > > > $s = 1 << 16; $a = []; > > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } > > It makes performance problem, but it doesn't mean we should kill it. > We have "max_execution_time" to catch them all. > > I think only big arrays coming from external sources should be checked. > There is no way for PHP to know whether or not an array is constructed from an external source. Yes, of course we can special case json_decode() and unserialize(), but this would not fix the vulnerability for any array that is created in PHP code. It's not exactly unusual to create arrays which are in some way based on user input. Your solution is incomplete, anyway, because of crafting a single list with > 1 collisions, attacker will able to use N lists with 1000 collisions. > One list with N*1000 collisions and N lists with 1000 collisions each have very different asymptotic complexity. The former is O(N^2), while the latter is O(N). The DOS attack is based on having O(N^2) complexity. Nikita
Re: [PHP-DEV] HashDos protection
On 1 December 2015 at 09:50, Dmitry Stogovwrote: > Hi Nikita, > > few notes: > > It looks like the patch messes the attempt of catching the problem (few > lines related to zend_hash_find_bucket changes) with optimization to > compensate collision check overhead (everything else). I think it's better > to separate these parts. > > Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward > incompatibility with 7.0.0. However, we may reserve it for internal usage > removing ZEND_API. > > I don't think PHP should prevent user from construction of arrays he likes, > because of internal problems (like in your example). > > > $s = 1 << 16; $a = []; > > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } > > It makes performance problem, but it doesn't mean we should kill it. > We have "max_execution_time" to catch them all. > > I think only big arrays coming from external sources should be checked. > > Your solution is incomplete, anyway, because of crafting a single list with > 1 collisions, attacker will able to use N lists with 1000 collisions. > > Thanks. Dmitry. This is a good point but bear in mind to get the same effect as for 1 collisions, you'd need 100 lists not 10
Re: [PHP-DEV] HashDos protection
On Tue, Dec 1, 2015 at 3:48 PM, Nikita Popovwrote: > On Tue, Dec 1, 2015 at 10:50 AM, Dmitry Stogov wrote: > >> Hi Nikita, >> >> few notes: >> >> It looks like the patch messes the attempt of catching the problem (few >> lines related to zend_hash_find_bucket changes) with optimization to >> compensate collision check overhead (everything else). I think it's better >> to separate these parts. >> > > The addition of zend_hash_add_or_return() isn't an optimization, it is > required to ensure that the check happens in all relevant cases. The code > previously used a combination of zend_hash_find() and zend_hash_add_new(). > However the whole purpose of the zend_hash_add_new() function is to skip > the part of the insertion operation where the collision count would be done. > > At this point I could either a) also count collisions in zend_hash_find(), > which is imho not a good option as it's enough to count once and not at > every lookup, or b) avoid the combination of zend_hash_find() and > zend_hash_add_new() by introducing a new zend_hash_add_or_return() function > (which does the same as the previous combination but also performs the > collision count). Alternatively I would also simply make > zend_hash_add_new() the same as zend_hash_add(), which is probably not > wanted either. > I got it now. > > >> Introduction of zend_hash_add_or_return() in 7.0.1 is going to make >> forward incompatibility with 7.0.0. However, we may reserve it for internal >> usage removing ZEND_API. >> > > I'm fine with not marking it ZEND_API in 7.0. > > >> I don't think PHP should prevent user from construction of arrays he >> likes, because of internal problems (like in your example). >> >> > $s = 1 << 16; $a = []; >> > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } >> >> It makes performance problem, but it doesn't mean we should kill it. >> We have "max_execution_time" to catch them all. >> > >> I think only big arrays coming from external sources should be checked. >> > > There is no way for PHP to know whether or not an array is constructed > from an external source. Yes, of course we can special case json_decode() > and unserialize(), but this would not fix the vulnerability for any array > that is created in PHP code. It's not exactly unusual to create arrays > which are in some way based on user input. > If this is a real problem, we should better provide an API for "safe" array construction from "unsafe" data. Also, if such arrays are constructed from many elements, it may make sense to insert all elements first and then rehash them. This will make O(2*N) complexity, even if all of them have the same hash value. > > Your solution is incomplete, anyway, because of crafting a single list >> with 1 collisions, attacker will able to use N lists with 1000 >> collisions. >> > > One list with N*1000 collisions and N lists with 1000 collisions each have > very different asymptotic complexity. The former is O(N^2), while the > latter is O(N). The DOS attack is based on having O(N^2) complexity. > You are right of course about O(N) and O(N^2), but today affecting data-locality may make more harm then list length. The fact, that PHP7 is few times faster on your example, is caused by the linear placement of the collisions list. Once we start to work with randomized lists, in addition to complexity, we will also suffer from stalls because of the cache misses. Thanks. Dmitry. Nikita >
Re: [PHP-DEV] HashDos protection
On Dec 1, 2015 4:50 PM, "Dmitry Stogov"wrote: > > I think only big arrays coming from external sources should be checked. I tend to agree here. We discussed it with Remote last week. I was trying to explain why having a crafted hash function for inputs may be better and safer. That includes get/post/env/serialize/json and the likes. The performance impact for these is most likely minimal for only them while ensuring a better protection from a long term point of view. I may be wrong and did not think much more than brainstorming about it. So take it with a bit of salt :)
RE: [PHP-DEV] HashDos protection
Hi Stas, > -Original Message- > From: Stanislav Malyshev [mailto:smalys...@gmail.com] > Sent: Tuesday, December 1, 2015 12:21 AM > To: Nikita Popov <nikita@gmail.com>; PHP internals > <internals@lists.php.net>; Anatol Belski <anatol@belski.net>; Remi Collet > <r...@php.net> > Subject: Re: [PHP-DEV] HashDos protection > > Hi! > > > To fix the HashDos vulnerability for *all* cases (rather than just > > GET/POST parsing), I propose to introduce collision counting during > > hashtable insertion operations. This will throw a fatal error if the > > number of collisions during an insertion operation exceed a certain > > threshold. > > > > Implementation: https://github.com/php/php-src/pull/1565 > > This looks pretty cool. I'd support making the limit configurable though, is > there > a reason why it's not? > >From what I was testing, the configuration is not absolutely necessary. The >normal usage doesn't seem to cause the situation reproducible by >https://github.com/bk2204/php-hash-dos . Even with a big array - with patched >PHP I was reaching like 2.5 millions of string keys and gave up. On the other >hand, if such a malicious situation would be reached, the application would >become unusable - so the configuration is senseless for that case. If the >array is big and there are too many collisions, PHP would just iterate over >all the buckets all over again looking for a suitable one. Maybe the only case >where INI could be useful were to force the exact zero collision or very low >collision rate to bail out. At least that was my observation. Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
Hi! > To fix the HashDos vulnerability for *all* cases (rather than just GET/POST > parsing), I propose to introduce collision counting during hashtable > insertion operations. This will throw a fatal error if the number of > collisions during an insertion operation exceed a certain threshold. > > Implementation: https://github.com/php/php-src/pull/1565 This looks pretty cool. I'd support making the limit configurable though, is there a reason why it's not? -- Stas Malyshev smalys...@gmail.com -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
> 1. (Self-balancing binary trees). The idea here is that a balanced binary > tree has worst-case insertion time O(log N), while the linked list we > normally use has worst-case insertion time O(N). This means that the > worst-case execution time for N insertions into a hashtable goes from > O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than > usual". > > I don't think this solution is viable for PHP. Supporting binary trees next > to linked list collision resolution will make the hashtable implementation > *a lot* more complicated -- and it isn't exactly simple as is. I have written self-balancing binary trees on multiple occasions. I know that we want to minimize complexity but if the complexity actually helps solve this issue I think it would be worth it. I assume you didn't create a proof of concept for this option – is that correct? I think fataling because of collisions is a poor solution to this problem because fataling in some cases may contribute to a successful DOS attack (I do not think Aerys is the only piece of software that would be vulnerable to this). I am sure PHP is not the only language that has experienced pain with HashDos – how have other languages handled it? -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
On Sat, Nov 28, 2015 at 12:22 AM, Thomas Hruskawrote: > On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote: > >> Hi Thomas, >> >> In practice, we wouldn't have problems with max number of collisions. >> > > Is CLI going to be or can CLI be excluded from max collisions? After > thinking about it for a long while, that's my only area of concern here. > SAPI can (fatal) error to its heart's content and I'll find ways to live > with it. But CLI needs stability guarantees. 'max_input_vars' didn't > affect CLI. However, max collisions could negatively affect CLI because > the change affects all arrays instead of just the superglobals. > > Real-world scenario: About once every 4 to 6 months I will write a script > to load up a single PHP array with a few million records. Partly because I > can and partly because I need to. On CLI, especially for my one-off > quick-n-dirty scripts, I don't care how much CPU, RAM, and other resources > are used nor how much time it takes to complete. I just care that the > script finishes running successfully. If the script reads in a record and > attempts to add it to the array, the proposed max collisions might trigger > a fatal error if it hits the max collision limit for arrays. My experience > is that CLI is silent about 50% of the time when it encounters a fatal > error. So my script would drop back to the prompt after spending many > minutes to hours loading the data, not having done any work, and not emit > any error(s). I would think that it had completed successfully until I > went to look at the results and the results I would be expecting to see > wouldn't be there. > > I abuse PHP arrays. Especially on CLI. Sorry. > To make sure there's no confusion on this point: When I say "number of collisions" I do not mean the total sum of collisions in a hashtable. That's a meaningless value, as a large hashtable will have a large total number of collisions, even if it is not malicious. What I rather mean is the number of collisions in a *single* collisions resolution chain. Lets to to quantify the probability of reaching the collision limit C with a hashtable of size N and assuming a random hash distribution. The upper bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1) being the probability that C elements hash to one value and (N over C) the number of C element subsets of an N element set. For large N and N >> C we approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100 probability 2.3E-149. The patch uses C = 1000. Now my probability-foo is pretty weak, so it's not unlikely that this result is totally wrong ^^ In any case, the point is that even for very large arrays, there is no way you're going to reach the collision limit with random data. Of course it can happen that your (obviously not perfectly random) data set happens to have keys that our hash function behaves particularly badly with. E.g. if you create a hashtable with pointers, you'll likely be getting an expected length of 8 for each collision chain due to alignment. If for whatever reason you are inserting integers that all happen to be a factor of 2^10, you could reach the collision limit. If that's the case you're probably still better off knowing that you're trying to Dos yourself, rather than just silently slowing down to a crawl ;) I'm not adverse to making the collision limit configurable, but it would still be enabled by default for CLI. Nikita
RE: [PHP-DEV] HashDos protection
Hi Thomas, > -Original Message- > From: Thomas Hruska [mailto:thru...@cubiclesoft.com] > Sent: Saturday, November 28, 2015 12:23 AM > To: Yasuo Ohgaki <yohg...@ohgaki.net> > Cc: PHP internals <internals@lists.php.net> > Subject: Re: [PHP-DEV] HashDos protection > > On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote: > > Hi Thomas, > > > > In practice, we wouldn't have problems with max number of collisions. > > Is CLI going to be or can CLI be excluded from max collisions? After thinking > about it for a long while, that's my only area of concern here. > SAPI can (fatal) error to its heart's content and I'll find ways to live > with it. But > CLI needs stability guarantees. 'max_input_vars' > didn't affect CLI. However, max collisions could negatively affect CLI > because > the change affects all arrays instead of just the superglobals. > > Real-world scenario: About once every 4 to 6 months I will write a script to > load > up a single PHP array with a few million records. Partly because I can and > partly > because I need to. On CLI, especially for my one-off quick-n-dirty scripts, > I don't > care how much CPU, RAM, and other resources are used nor how much time it > takes to complete. I just care that the script finishes running > successfully. If the > script reads in a record and attempts to add it to the array, the proposed max > collisions might trigger a fatal error if it hits the max collision limit for > arrays. > My experience is that CLI is silent about 50% of the time when it encounters a > fatal error. So my script would drop back to the prompt after spending many > minutes to hours loading the data, not having done any work, and not emit any > error(s). I would think that it had completed successfully until I went to > look at > the results and the results I would be expecting to see wouldn't be there. > > I abuse PHP arrays. Especially on CLI. Sorry. > The particular case being fixes is a vulnerability which can be specifically triggered. If you already used to write scripts creating arrays with millions of records, and those went through, means there was never enough collisions produced. If you have time, I'd ask you to please test this patch with one of your previous scripts of that kind. The reproduce case from https://github.com/bk2204/php-hash-dos contains 10^6 crafted keys, that renders the application to just hang as hash table will be busy resolving collisions and looking for free buckets. I was running that reproduce case and broke the CLI script after a hour as it doesn't make sense. Ofc, a cron running for hours or even days is ok, but it is clear that with invalid keys that cron will spend those days not in the actual job but resolving collisions. So indeed, if you would meet the malicious situation, your script wouldn't work. Using same approach for my test, I'm already able to produce an array with around 2*10^6 11 byte long keys from 62 chars set (see the script linked here https://github.com/php/php-src/pull/1565#issuecomment-160231878 ). It is supposed to show, that the patch doesn't produce an issue with creating huge arrays. I'd really plea for a bit less theoretical approach at this point - lets test it with any possible concerning cases and see. In the real life, the key length can vary, the quality of keys can vary, the keys can be explicitly made suitable for a huge data set actually just same way they can be crafted for a malicious attack. Say, use bigger set for keys and use longer keys, use keys generated by random_bytes(), etc. As such scripts processing big datasets are usually only meant for cronjobs, taking special measures could affect their runtime, however - if a scripts runs for days, + or - a couple of hours won't be significant. So after all, fixing the vulnerability could theoretically cost something for the justified but unusual use cases like big datasets. However it won't make those unusual cases impossible, furthermore it doesn't really seem to have an impact in practice, and it won't affect common cases at all. At least that is what my current tests show. Please, let's fetch the patch and test it. Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
On Thu, Nov 26, 2015 at 8:35 PM, Niklas Kellerwrote: > Would this be a catchable Error (implementing Throwable) or a real fatal? > Having a real fatal could lead to a DOS in Aerys as you'd be able to crash > workers with carefully crafted input variables then. > It would be a real fatal error. Throwing an exception in this place would be problematic, as we're inserting into hashtables all over the place and not all of these places are exception-safe. Many places also make the reasonable assumption that (normal) insert operations always succeed. You are correct in saying that this approach would mean that you can now easily trigger a fatal error in an application that was previously susceptible to HashDos (and likely also in some that previously were not). In most cases this should not be a problem, however for Aerys and likely other daemons a fatal error implies losing all connections on the current worker. This may effectively be worse than a HashDos attack. The only thing I can think to offer here is to make the collision number threshold ini configurable, so you would have the ability to effectively disable this protection. In that case you'd have to manually perform size checks in the right places if you wish to protect against HashDos. I'm fine with having a solution that only works for the 99% of applications, until such a time, if ever, as we can feasibly implement one of the alternative schemes that would cover all possible scenarios. Nikita
Re: [PHP-DEV] HashDos protection
On 11/27/2015 9:12 AM, Nikita Popov wrote: On Thu, Nov 26, 2015 at 8:35 PM, Niklas Kellerwrote: Would this be a catchable Error (implementing Throwable) or a real fatal? Having a real fatal could lead to a DOS in Aerys as you'd be able to crash workers with carefully crafted input variables then. It would be a real fatal error. Throwing an exception in this place would be problematic, as we're inserting into hashtables all over the place and not all of these places are exception-safe. Many places also make the reasonable assumption that (normal) insert operations always succeed. You are correct in saying that this approach would mean that you can now easily trigger a fatal error in an application that was previously susceptible to HashDos (and likely also in some that previously were not). In most cases this should not be a problem, however for Aerys and likely other daemons a fatal error implies losing all connections on the current worker. This may effectively be worse than a HashDos attack. The only thing I can think to offer here is to make the collision number threshold ini configurable, so you would have the ability to effectively disable this protection. In that case you'd have to manually perform size checks in the right places if you wish to protect against HashDos. I'm fine with having a solution that only works for the 99% of applications, until such a time, if ever, as we can feasibly implement one of the alternative schemes that would cover all possible scenarios. Nikita I don't know if anyone has suggested this before, but why not have a function that application developers can call to switch hash modes and support multiple hash modes in the core? I've always viewed 'max_input_vars' as an emergency hack and I've run into the default 1,000 limit many times. When I hit that limit, I inevitably have to raise it to anywhere from 3,000 to 10,000 to get the target application to function, which, of course, puts the whole server at risk. If the application developer could declare what hash mode to use for new arrays, that could potentially solve the problem more cleanly. SipHash would be used for security scenarios (PHP loading of superglobals, user input, etc.) and the current DJB hash for everything else. That could, of course, lead to unfortunate situations. Alternatively, default to SipHash for everything but the application developer could turn it off and on as needed at runtime. Put a warning in the documentation about the security of hashes and also mention leaving SipHash enabled when in doubt. -- Thomas Hruska CubicleSoft President I've got great, time saving software that you will find useful. http://cubiclesoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
Hi, On Thu, Nov 26, 2015 at 5:24 PM, Nikita Popovwrote: > > > What are your thoughts on this? > > First of all, thanks a lot for looking into it! That's great! I think that it's all cool except the fact that json_decode would result in fatal error. I don't think that json_decode should kill the script in this case. It's much better to return NULL and set a new JSON_ERROR_... We could maybe add some EG var to silence it when using json_decode and set it back when finished. In that case when the collision limit is exceeded it would set another EG that could be checked in the json_parser.y as UNEXPECTED after each addition to the object or array. It's not probably nicest solution and it would probably slow down slightly the json parser but it could work and it's still better than fatal error IMHO. What do you think? Cheers Jakub
Re: [PHP-DEV] HashDos protection
Currently it's not catchable, that's my main concern. If it's catchable, it's not that much of a problem. Regards, Niklas 2015-11-27 10:05 GMT+01:00 Yasuo Ohgaki: > Hi Nikita, > > On Fri, Nov 27, 2015 at 2:24 AM, Nikita Popov > wrote: > > What are your thoughts on this? > > Great! This is exactly what I was thinking. > I prefer collision counting rather than slower hash function. > > Hardcoded collision max (1000) seems ok for me. > Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO. > > Regards, > > -- > Yasuo Ohgaki > yohg...@ohgaki.net > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
RE: [PHP-DEV] HashDos protection
Hi Nikita, > -Original Message- > From: Nikita Popov [mailto:nikita@gmail.com] > Sent: Thursday, November 26, 2015 6:25 PM > To: PHP internals <internals@lists.php.net>; Anatol Belski > <anatol@belski.net>; Remi Collet <r...@php.net> > Subject: [PHP-DEV] HashDos protection > > Hi internals! > > This mail turned out to be rather long, so I'll start with a TL;DR: > > To fix the HashDos vulnerability for *all* cases (rather than just GET/POST > parsing), I propose to introduce collision counting during hashtable insertion > operations. This will throw a fatal error if the number of collisions during > an > insertion operation exceed a certain threshold. > > Implementation: https://github.com/php/php-src/pull/1565 > > From my testing the change has no negative performance impact. The change is > small and does not break ABI. > > Tracking bug (with various implementations): > https://bugs.php.net/bug.php?id=70644 > > What are your thoughts on this? > Responding to the short version as well :) I was checking your patch and I think it is great. Currently I see no ABI breach (please correct me if I err). So IMHO after sufficient discussion, corrections and testing, given there's still no ABI incompatibility, it should be backported to 7.0 as early as possible. Regards Anatol -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote: Hi Thomas, In practice, we wouldn't have problems with max number of collisions. Is CLI going to be or can CLI be excluded from max collisions? After thinking about it for a long while, that's my only area of concern here. SAPI can (fatal) error to its heart's content and I'll find ways to live with it. But CLI needs stability guarantees. 'max_input_vars' didn't affect CLI. However, max collisions could negatively affect CLI because the change affects all arrays instead of just the superglobals. Real-world scenario: About once every 4 to 6 months I will write a script to load up a single PHP array with a few million records. Partly because I can and partly because I need to. On CLI, especially for my one-off quick-n-dirty scripts, I don't care how much CPU, RAM, and other resources are used nor how much time it takes to complete. I just care that the script finishes running successfully. If the script reads in a record and attempts to add it to the array, the proposed max collisions might trigger a fatal error if it hits the max collision limit for arrays. My experience is that CLI is silent about 50% of the time when it encounters a fatal error. So my script would drop back to the prompt after spending many minutes to hours loading the data, not having done any work, and not emit any error(s). I would think that it had completed successfully until I went to look at the results and the results I would be expecting to see wouldn't be there. I abuse PHP arrays. Especially on CLI. Sorry. -- Thomas Hruska CubicleSoft President I've got great, time saving software that you will find useful. http://cubiclesoft.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
Hi Thomas, On Sat, Nov 28, 2015 at 2:07 AM, Thomas Hruskawrote: > I don't know if anyone has suggested this before, but why not have a > function that application developers can call to switch hash modes and > support multiple hash modes in the core? > > I've always viewed 'max_input_vars' as an emergency hack and I've run into > the default 1,000 limit many times. When I hit that limit, I inevitably > have to raise it to anywhere from 3,000 to 10,000 to get the target > application to function, which, of course, puts the whole server at risk. Because any hash functions have collisions. Even if we use stronger hash against collisions, computers are getting faster and faster, creating colliding key datatabease becomes easier and easier. Clever person may find algolithmic way to generate colliding keys in the future also. In practice, we wouldn't have problems with max number of collisions. Max number of collisions resolves the issue for good and we may execute code faster with faster hash. I forgot the number but SipHash is much slower than DJB. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
Hi Nikita, On Fri, Nov 27, 2015 at 2:24 AM, Nikita Popovwrote: > What are your thoughts on this? Great! This is exactly what I was thinking. I prefer collision counting rather than slower hash function. Hardcoded collision max (1000) seems ok for me. Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO. Regards, -- Yasuo Ohgaki yohg...@ohgaki.net -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] HashDos protection
On Thu, 26 Nov 2015 at 17:25 Nikita Popovwrote: > This will throw a fatal error if the number of > collisions during an insertion operation exceed a certain threshold. > To me this feels like it's just moving a DoS vector from one place to another. As Niklas already pointed out, he is directly affected by this. I was considering the scenario: 1) Open resources 2) json_decode 3) Do stuff 4) Clean up resources This makes the DoS more application specific, but there's any number of creative uses for making an application unexpectedly fail half way through. You can argue it's similar to any DoS that causes a request to run out of memory half way through, it is in some ways. I don't think an exception is right for this either. People blindly catch all exceptions because they can. Not sure what to suggest.
[PHP-DEV] HashDos protection
Hi internals! This mail turned out to be rather long, so I'll start with a TL;DR: To fix the HashDos vulnerability for *all* cases (rather than just GET/POST parsing), I propose to introduce collision counting during hashtable insertion operations. This will throw a fatal error if the number of collisions during an insertion operation exceed a certain threshold. Implementation: https://github.com/php/php-src/pull/1565 >From my testing the change has no negative performance impact. The change is small and does not break ABI. Tracking bug (with various implementations): https://bugs.php.net/bug.php?id=70644 What are your thoughts on this? Nikita - For those not familiar with HashDos or interested in some considerations regarding alternative implementations, the original mail: In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in the form of max_input_vars. As a recap, HashDos exploits the fact that hashtables (which PHP uses ubiquitously) perform very badly if the hashes of many elements collide. In particular the performance of inserting N elements into a hashtable can degrade from O(N) to O(N^2) using carefully chosen keys. A very simple demo script for creating a hashtable with many collisions: $s = 1 << 16; $a = []; for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } This code creates an array with about 65k elements and runs in 30s (PHP 5.6) or 4s (PHP 7) on my machine. This behavior can be exploited by an attacker whenever a server running PHP can be made to create a decent-sized array (or object) with attacker-controlled keys. This DOS vulnerability is efficient: A 700kb payload can easily take 5 CPU seconds to process on PHP 7 and from there it goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc). The most obvious attack vector are GET and POST variables, which are automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue by introducing max_input_vars, which prevents creating GET/POST arrays with more than a certain number of elements. As the HashDos attack relies on O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix. However, GET/POST are not the only ways an attacker can trigger array/object creation with specific keys. For example many PHP applications have endpoints that accept JSON encoded payloads, which are not protected by max_input_vars. The only means of protection currently available to userland code is to never decode JSON payloads that exceed some conservative size limit like 50-100KB (often not practical). Apart from JSON, there will likely be various other application-specific situations where arrays are generated which contain user-provided keys. While we could add something similar to max_input_vars to json_decode and unserialize, this would not solve the problem for any custom code. There are essentially three ways to prevent HashDos attacks in general (rather than patching specific places): 1. If there are many collisions for a hash slot, switch collision resolution from linked lists to self-balancing binary trees. 2. Switch hashtables to a keyed cryptographic hash like SipHash. 3. If there are very many collisions for a hash slot, throw a fatal error. I will comment on these possibilities as they apply to PHP. 1. (Self-balancing binary trees). The idea here is that a balanced binary tree has worst-case insertion time O(log N), while the linked list we normally use has worst-case insertion time O(N). This means that the worst-case execution time for N insertions into a hashtable goes from O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than usual". I don't think this solution is viable for PHP. Supporting binary trees next to linked list collision resolution will make the hashtable implementation *a lot* more complicated -- and it isn't exactly simple as is. 2. (Randomized cryptographic hash). This approach doesn't change anything about how collisions are handled, instead it prevents the attacker from constructing such collisions in the first place. This is done by using a fast cryptographic hash function (typically SipHash), which is keyed with a (cryptographically strong) random key. As the attacker does not know the key, he cannot efficiently construct arrays with many collisions. There are a number of factors to take into account when trying to apply this to PHP: a) The SipHash key would have to be the same for all processes in an FPM pool. We can't use a per-process or per-request key due to arrays living in opcache SHM. As far as I know SipHash is designed to be secure even when the used key is long living, so this does not appear to be an issue. b) SipHash is slower (especially for very small strings) than the hash function we currently use (DJBX33A). As PHP 7 caches the hashes for strings, I do not believe this to be a significant problem. c) The real issue: Currently PHP does not hash integer keys, but they are just as susceptible to
Re: [PHP-DEV] HashDos protection
> > 3. (Fatal error on many collisions). While the two previous options try to > ensure that hashtables stay efficient regardless of the used keys, the last > option aims to simply detect malicious array keys and abort the script in > such a case. > > This is done by counting the number of collisions during hashtable > insertion operations and throwing a fatal error if this collisions count > exceeds a certain threshold. > Would this be a catchable Error (implementing Throwable) or a real fatal? Having a real fatal could lead to a DOS in Aerys as you'd be able to crash workers with carefully crafted input variables then. Thanks, Niklas