Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Remi Collet
Le 01/12/2015 02:28, Anatol Belski a écrit :

>From what I was testing, the configuration is not absolutely necessary.

My first thinking, seeing this hardcoded limit was also, "we need to
make it an new ini option".

So I run a few tests.

Especially the Anatol one (a bit optimized to get more unique keys
faster), and with a debug message in zend_hash.c to display max number
of collisions encountered during the run

After some time...

With around 9.2 millions of unique keys, and ~1.2GB of memory used,
the max collisions never exceeds "12".

So I really think, now, that we can go without new ini option.


Remi.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Dmitry Stogov
Hi Nikita,

few notes:

It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.

Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward
incompatibility with 7.0.0. However, we may reserve it for internal usage
removing ZEND_API.

I don't think PHP should prevent user from construction of arrays he likes,
because of internal problems (like in your example).

>  $s = 1 << 16; $a = [];
>  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }

It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.

I think only big arrays coming from external sources should be checked.

Your solution is incomplete, anyway, because of crafting a single list with
1 collisions, attacker will able to use N lists with 1000 collisions.

Thanks. Dmitry.


On Thu, Nov 26, 2015 at 8:24 PM, Nikita Popov  wrote:

> Hi internals!
>
> This mail turned out to be rather long, so I'll start with a TL;DR:
>
> To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
> parsing), I propose to introduce collision counting during hashtable
> insertion operations. This will throw a fatal error if the number of
> collisions during an insertion operation exceed a certain threshold.
>
> Implementation: https://github.com/php/php-src/pull/1565
>
> From my testing the change has no negative performance impact. The change
> is small and does not break ABI.
>
> Tracking bug (with various implementations):
> https://bugs.php.net/bug.php?id=70644
>
> What are your thoughts on this?
>
> Nikita
>
> -
>
> For those not familiar with HashDos or interested in some considerations
> regarding alternative implementations, the original mail:
>
> In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
> the form of max_input_vars. As a recap, HashDos exploits the fact that
> hashtables (which PHP uses ubiquitously) perform very badly if the hashes
> of many elements collide. In particular the performance of inserting N
> elements into a hashtable can degrade from O(N) to O(N^2) using carefully
> chosen keys.
>
> A very simple demo script for creating a hashtable with many collisions:
>
> $s = 1 << 16; $a = [];
> for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>
> This code creates an array with about 65k elements and runs in 30s (PHP
> 5.6) or 4s (PHP 7) on my machine.
>
> This behavior can be exploited by an attacker whenever a server running PHP
> can be made to create a decent-sized array (or object) with
> attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
> payload can easily take 5 CPU seconds to process on PHP 7 and from there it
> goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).
>
> The most obvious attack vector are GET and POST variables, which are
> automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
> by introducing max_input_vars, which prevents creating GET/POST arrays with
> more than a certain number of elements. As the HashDos attack relies on
> O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.
>
> However, GET/POST are not the only ways an attacker can trigger
> array/object creation with specific keys. For example many PHP applications
> have endpoints that accept JSON encoded payloads, which are not protected
> by max_input_vars. The only means of protection currently available to
> userland code is to never decode JSON payloads that exceed some
> conservative size limit like 50-100KB (often not practical).
>
> Apart from JSON, there will likely be various other application-specific
> situations where arrays are generated which contain user-provided keys.
> While we could add something similar to max_input_vars to json_decode and
> unserialize, this would not solve the problem for any custom code.
>
> There are essentially three ways to prevent HashDos attacks in general
> (rather than patching specific places):
>
> 1. If there are many collisions for a hash slot, switch collision
> resolution from linked lists to self-balancing binary trees.
> 2. Switch hashtables to a keyed cryptographic hash like SipHash.
> 3. If there are very many collisions for a hash slot, throw a fatal error.
>
> I will comment on these possibilities as they apply to PHP.
>
> 1. (Self-balancing binary trees). The idea here is that a balanced binary
> tree has worst-case insertion time O(log N), while the linked list we
> normally use has worst-case insertion time O(N). This means that the
> worst-case execution time for N insertions into a hashtable goes from
> O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
> usual".
>
> I don't think this solution is viable for PHP. Supporting binary trees next
> to linked list 

Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Nikita Popov
On Tue, Dec 1, 2015 at 10:50 AM, Dmitry Stogov  wrote:

> Hi Nikita,
>
> few notes:
>
> It looks like the patch messes the attempt of catching the problem (few
> lines related to zend_hash_find_bucket changes) with optimization to
> compensate collision check overhead (everything else). I think it's better
> to separate these parts.
>

The addition of zend_hash_add_or_return() isn't an optimization, it is
required to ensure that the check happens in all relevant cases. The code
previously used a combination of zend_hash_find() and zend_hash_add_new().
However the whole purpose of the zend_hash_add_new() function is to skip
the part of the insertion operation where the collision count would be done.

At this point I could either a) also count collisions in zend_hash_find(),
which is imho not a good option as it's enough to count once and not at
every lookup, or b) avoid the combination of zend_hash_find() and
zend_hash_add_new() by introducing a new zend_hash_add_or_return() function
(which does the same as the previous combination but also performs the
collision count). Alternatively I would also simply make
zend_hash_add_new() the same as zend_hash_add(), which is probably not
wanted either.


> Introduction of zend_hash_add_or_return() in 7.0.1 is going to make
> forward incompatibility with 7.0.0. However, we may reserve it for internal
> usage removing ZEND_API.
>

I'm fine with not marking it ZEND_API in 7.0.


> I don't think PHP should prevent user from construction of arrays he
> likes, because of internal problems (like in your example).
>
> >  $s = 1 << 16; $a = [];
> >  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>
> It makes performance problem, but it doesn't mean we should kill it.
> We have "max_execution_time" to catch them all.
>

> I think only big arrays coming from external sources should be checked.
>

There is no way for PHP to know whether or not an array is constructed from
an external source. Yes, of course we can special case json_decode() and
unserialize(), but this would not fix the vulnerability for any array that
is created in PHP code. It's not exactly unusual to create arrays which are
in some way based on user input.

Your solution is incomplete, anyway, because of crafting a single list with
> 1 collisions, attacker will able to use N lists with 1000 collisions.
>

One list with N*1000 collisions and N lists with 1000 collisions each have
very different asymptotic complexity. The former is O(N^2), while the
latter is O(N). The DOS attack is based on having O(N^2) complexity.

Nikita


Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Chris Riley
On 1 December 2015 at 09:50, Dmitry Stogov  wrote:

> Hi Nikita,
>
> few notes:
>
> It looks like the patch messes the attempt of catching the problem (few
> lines related to zend_hash_find_bucket changes) with optimization to
> compensate collision check overhead (everything else). I think it's better
> to separate these parts.
>
> Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward
> incompatibility with 7.0.0. However, we may reserve it for internal usage
> removing ZEND_API.
>
> I don't think PHP should prevent user from construction of arrays he likes,
> because of internal problems (like in your example).
>
> >  $s = 1 << 16; $a = [];
> >  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>
> It makes performance problem, but it doesn't mean we should kill it.
> We have "max_execution_time" to catch them all.
>
> I think only big arrays coming from external sources should be checked.
>
> Your solution is incomplete, anyway, because of crafting a single list with
> 1 collisions, attacker will able to use N lists with 1000 collisions.
>
> Thanks. Dmitry.


This is a good point but bear in mind to get the same effect as for 1
collisions, you'd need 100 lists not 10


Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Dmitry Stogov
On Tue, Dec 1, 2015 at 3:48 PM, Nikita Popov  wrote:

> On Tue, Dec 1, 2015 at 10:50 AM, Dmitry Stogov  wrote:
>
>> Hi Nikita,
>>
>> few notes:
>>
>> It looks like the patch messes the attempt of catching the problem (few
>> lines related to zend_hash_find_bucket changes) with optimization to
>> compensate collision check overhead (everything else). I think it's better
>> to separate these parts.
>>
>
> The addition of zend_hash_add_or_return() isn't an optimization, it is
> required to ensure that the check happens in all relevant cases. The code
> previously used a combination of zend_hash_find() and zend_hash_add_new().
> However the whole purpose of the zend_hash_add_new() function is to skip
> the part of the insertion operation where the collision count would be done.
>
> At this point I could either a) also count collisions in zend_hash_find(),
> which is imho not a good option as it's enough to count once and not at
> every lookup, or b) avoid the combination of zend_hash_find() and
> zend_hash_add_new() by introducing a new zend_hash_add_or_return() function
> (which does the same as the previous combination but also performs the
> collision count). Alternatively I would also simply make
> zend_hash_add_new() the same as zend_hash_add(), which is probably not
> wanted either.
>

I got it now.


>
>
>> Introduction of zend_hash_add_or_return() in 7.0.1 is going to make
>> forward incompatibility with 7.0.0. However, we may reserve it for internal
>> usage removing ZEND_API.
>>
>
> I'm fine with not marking it ZEND_API in 7.0.
>
>
>> I don't think PHP should prevent user from construction of arrays he
>> likes, because of internal problems (like in your example).
>>
>> >  $s = 1 << 16; $a = [];
>> >  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>>
>> It makes performance problem, but it doesn't mean we should kill it.
>> We have "max_execution_time" to catch them all.
>>
>
>> I think only big arrays coming from external sources should be checked.
>>
>
> There is no way for PHP to know whether or not an array is constructed
> from an external source. Yes, of course we can special case json_decode()
> and unserialize(), but this would not fix the vulnerability for any array
> that is created in PHP code. It's not exactly unusual to create arrays
> which are in some way based on user input.
>

If this is a real problem, we should better provide an API for "safe" array
construction from "unsafe" data.
Also, if such arrays are constructed from many elements, it may make sense
to insert all elements first and then rehash them.
This will make O(2*N) complexity, even if all of them have the same hash
value.


>
> Your solution is incomplete, anyway, because of crafting a single list
>> with 1 collisions, attacker will able to use N lists with 1000
>> collisions.
>>
>
> One list with N*1000 collisions and N lists with 1000 collisions each have
> very different asymptotic complexity. The former is O(N^2), while the
> latter is O(N). The DOS attack is based on having O(N^2) complexity.
>

You are right of course about O(N) and O(N^2), but today affecting
data-locality may make more harm then list length.
The fact, that PHP7 is few times faster on your example, is caused by the
linear placement of the collisions list.
Once we start to work with randomized lists, in addition to complexity, we
will also suffer from stalls because of the cache misses.

Thanks. Dmitry.

Nikita
>


Re: [PHP-DEV] HashDos protection

2015-12-01 Thread Pierre Joye
On Dec 1, 2015 4:50 PM, "Dmitry Stogov"  wrote:
>

> I think only big arrays coming from external sources should be checked.

I tend to agree here.

We discussed it with Remote last week. I was trying to explain why having a
crafted hash function for inputs may be better and safer. That includes
get/post/env/serialize/json and the likes.

The performance impact for these is most likely minimal for only them while
ensuring a better protection from a long term point of view.

I may be wrong and did not think much more than brainstorming about it. So
take it with a bit of salt :)


RE: [PHP-DEV] HashDos protection

2015-11-30 Thread Anatol Belski
Hi Stas,

> -Original Message-
> From: Stanislav Malyshev [mailto:smalys...@gmail.com]
> Sent: Tuesday, December 1, 2015 12:21 AM
> To: Nikita Popov <nikita@gmail.com>; PHP internals
> <internals@lists.php.net>; Anatol Belski <anatol@belski.net>; Remi Collet
> <r...@php.net>
> Subject: Re: [PHP-DEV] HashDos protection
> 
> Hi!
> 
> > To fix the HashDos vulnerability for *all* cases (rather than just
> > GET/POST parsing), I propose to introduce collision counting during
> > hashtable insertion operations. This will throw a fatal error if the
> > number of collisions during an insertion operation exceed a certain 
> > threshold.
> >
> > Implementation: https://github.com/php/php-src/pull/1565
> 
> This looks pretty cool. I'd support making the limit configurable though, is 
> there
> a reason why it's not?
> 
>From what I was testing, the configuration is not absolutely necessary. The 
>normal usage doesn't seem to cause the situation reproducible by 
>https://github.com/bk2204/php-hash-dos . Even with a big array - with patched 
>PHP I was reaching like 2.5 millions of string keys and gave up. On the other 
>hand, if such a malicious situation would be reached, the application would 
>become unusable - so the configuration is senseless for that case. If the 
>array is big and there are too many collisions, PHP would just iterate over 
>all the buckets all over again looking for a suitable one. Maybe the only case 
>where INI could be useful were to force the exact zero collision or very low 
>collision rate to bail out. At least that was my observation.

Regards

Anatol 



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-30 Thread Stanislav Malyshev
Hi!

> To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
> parsing), I propose to introduce collision counting during hashtable
> insertion operations. This will throw a fatal error if the number of
> collisions during an insertion operation exceed a certain threshold.
> 
> Implementation: https://github.com/php/php-src/pull/1565

This looks pretty cool. I'd support making the limit configurable
though, is there a reason why it's not?

-- 
Stas Malyshev
smalys...@gmail.com

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-29 Thread Levi Morrison
> 1. (Self-balancing binary trees). The idea here is that a balanced binary
> tree has worst-case insertion time O(log N), while the linked list we
> normally use has worst-case insertion time O(N). This means that the
> worst-case execution time for N insertions into a hashtable goes from
> O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
> usual".
>
> I don't think this solution is viable for PHP. Supporting binary trees next
> to linked list collision resolution will make the hashtable implementation
> *a lot* more complicated -- and it isn't exactly simple as is.

I have written self-balancing binary trees on multiple occasions. I
know that we want to minimize complexity but if the complexity
actually helps solve this issue I think it would be worth it. I assume
you didn't create a proof of concept for this option – is that
correct?

I think fataling because of collisions is a poor solution to this
problem because fataling in some cases may contribute to a successful
DOS attack (I do not think Aerys is the only piece of software that
would be vulnerable to this). I am sure PHP is not the only language
that has experienced pain with HashDos – how have other languages
handled it?

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-28 Thread Nikita Popov
On Sat, Nov 28, 2015 at 12:22 AM, Thomas Hruska 
wrote:

> On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote:
>
>> Hi Thomas,
>>
>> In practice, we wouldn't have problems with max number of collisions.
>>
>
> Is CLI going to be or can CLI be excluded from max collisions?  After
> thinking about it for a long while, that's my only area of concern here.
> SAPI can (fatal) error to its heart's content and I'll find ways to live
> with it.  But CLI needs stability guarantees.  'max_input_vars' didn't
> affect CLI.  However, max collisions could negatively affect CLI because
> the change affects all arrays instead of just the superglobals.
>
> Real-world scenario:  About once every 4 to 6 months I will write a script
> to load up a single PHP array with a few million records.  Partly because I
> can and partly because I need to.  On CLI, especially for my one-off
> quick-n-dirty scripts, I don't care how much CPU, RAM, and other resources
> are used nor how much time it takes to complete.  I just care that the
> script finishes running successfully.  If the script reads in a record and
> attempts to add it to the array, the proposed max collisions might trigger
> a fatal error if it hits the max collision limit for arrays.  My experience
> is that CLI is silent about 50% of the time when it encounters a fatal
> error.  So my script would drop back to the prompt after spending many
> minutes to hours loading the data, not having done any work, and not emit
> any error(s).  I would think that it had completed successfully until I
> went to look at the results and the results I would be expecting to see
> wouldn't be there.
>
> I abuse PHP arrays.  Especially on CLI.  Sorry.
>

To make sure there's no confusion on this point: When I say "number of
collisions" I do not mean the total sum of collisions in a hashtable.
That's a meaningless value, as a large hashtable will have a large total
number of collisions, even if it is not malicious. What I rather mean is
the number of collisions in a *single* collisions resolution chain.

Lets to to quantify the probability of reaching the collision limit C with
a hashtable of size N and assuming a random hash distribution. The upper
bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1) being
the probability that C elements hash to one value and (N over C) the number
of C element subsets of an N element set. For large N and N >> C we
approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper
bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest
possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100
probability 2.3E-149. The patch uses C = 1000.

Now my probability-foo is pretty weak, so it's not unlikely that this
result is totally wrong ^^ In any case, the point is that even for very
large arrays, there is no way you're going to reach the collision limit
with random data. Of course it can happen that your (obviously not
perfectly random) data set happens to have keys that our hash function
behaves particularly badly with. E.g. if you create a hashtable with
pointers, you'll likely be getting an expected length of 8 for each
collision chain due to alignment. If for whatever reason you are inserting
integers that all happen to be a factor of 2^10, you could reach the
collision limit. If that's the case you're probably still better off
knowing that you're trying to Dos yourself, rather than just silently
slowing down to a crawl ;)

I'm not adverse to making the collision limit configurable, but it would
still be enabled by default for CLI.

Nikita


RE: [PHP-DEV] HashDos protection

2015-11-28 Thread Anatol Belski
Hi Thomas,

> -Original Message-
> From: Thomas Hruska [mailto:thru...@cubiclesoft.com]
> Sent: Saturday, November 28, 2015 12:23 AM
> To: Yasuo Ohgaki <yohg...@ohgaki.net>
> Cc: PHP internals <internals@lists.php.net>
> Subject: Re: [PHP-DEV] HashDos protection
> 
> On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote:
> > Hi Thomas,
> >
> > In practice, we wouldn't have problems with max number of collisions.
> 
> Is CLI going to be or can CLI be excluded from max collisions?  After thinking
> about it for a long while, that's my only area of concern here.
>   SAPI can (fatal) error to its heart's content and I'll find ways to live 
> with it.  But
> CLI needs stability guarantees.  'max_input_vars'
> didn't affect CLI.  However, max collisions could negatively affect CLI 
> because
> the change affects all arrays instead of just the superglobals.
> 
> Real-world scenario:  About once every 4 to 6 months I will write a script to 
> load
> up a single PHP array with a few million records.  Partly because I can and 
> partly
> because I need to.  On CLI, especially for my one-off quick-n-dirty scripts, 
> I don't
> care how much CPU, RAM, and other resources are used nor how much time it
> takes to complete.  I just care that the script finishes running 
> successfully.  If the
> script reads in a record and attempts to add it to the array, the proposed max
> collisions might trigger a fatal error if it hits the max collision limit for 
> arrays.
> My experience is that CLI is silent about 50% of the time when it encounters a
> fatal error.  So my script would drop back to the prompt after spending many
> minutes to hours loading the data, not having done any work, and not emit any
> error(s).  I would think that it had completed successfully until I went to 
> look at
> the results and the results I would be expecting to see wouldn't be there.
> 
> I abuse PHP arrays.  Especially on CLI.  Sorry.
> 
The particular case being fixes is a vulnerability which can be specifically 
triggered. If you already used to write scripts creating arrays with millions 
of records, and those went through, means there was never enough collisions 
produced. If you have time, I'd ask you to please test this patch with one of 
your previous scripts of that kind.

The reproduce case from https://github.com/bk2204/php-hash-dos contains 10^6  
crafted keys, that renders the application to just hang as hash table will be 
busy resolving collisions and looking for free buckets. I was running that 
reproduce case and broke the CLI script after a hour as it doesn't make sense. 
Ofc, a cron running for hours or even days is ok, but it is clear that with 
invalid keys that cron will spend those days not in the actual job but 
resolving collisions. So indeed, if you would meet the malicious situation, 
your script wouldn't work. Using same approach for my test, I'm already able to 
produce an array with around 2*10^6 11 byte long keys from 62 chars set (see 
the script linked here 
https://github.com/php/php-src/pull/1565#issuecomment-160231878 ). It is 
supposed to show, that the patch doesn't produce an issue with creating huge 
arrays.

I'd really plea for a bit less theoretical approach at this point - lets test 
it with any possible concerning cases and see. In the real life, the key length 
can vary, the quality of keys can vary, the keys can be explicitly made 
suitable for a huge data set actually just same way they can be crafted for a 
malicious attack. Say, use bigger set for keys and use longer keys, use keys 
generated by random_bytes(),  etc. As such scripts processing big datasets are 
usually only meant for cronjobs, taking special measures could affect their 
runtime, however - if a scripts runs for days, + or - a couple of hours won't 
be significant.

So after all, fixing the vulnerability could theoretically cost something for 
the justified  but unusual use cases like big datasets. However it won't make 
those unusual cases impossible, furthermore it doesn't really seem to have an 
impact in practice, and it won't affect common cases at all. At least that is 
what my current tests show. Please, let's fetch the patch and test it.

Regards

Anatol





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Nikita Popov
On Thu, Nov 26, 2015 at 8:35 PM, Niklas Keller  wrote:

> Would this be a catchable Error (implementing Throwable) or a real fatal?
> Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
> workers with carefully crafted input variables then.
>

It would be a real fatal error. Throwing an exception in this place would
be problematic, as we're inserting into hashtables all over the place and
not all of these places are exception-safe. Many places also make the
reasonable assumption that (normal) insert operations always succeed.

You are correct in saying that this approach would mean that you can now
easily trigger a fatal error in an application that was previously
susceptible to HashDos (and likely also in some that previously were not).
In most cases this should not be a problem, however for Aerys and likely
other daemons a fatal error implies losing all connections on the current
worker. This may effectively be worse than a HashDos attack.

The only thing I can think to offer here is to make the collision number
threshold ini configurable, so you would have the ability to effectively
disable this protection. In that case you'd have to manually perform size
checks in the right places if you wish to protect against HashDos.

I'm fine with having a solution that only works for the 99% of
applications, until such a time, if ever, as we can feasibly implement one
of the alternative schemes that would cover all possible scenarios.

Nikita


Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Thomas Hruska

On 11/27/2015 9:12 AM, Nikita Popov wrote:

On Thu, Nov 26, 2015 at 8:35 PM, Niklas Keller  wrote:


Would this be a catchable Error (implementing Throwable) or a real fatal?
Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
workers with carefully crafted input variables then.



It would be a real fatal error. Throwing an exception in this place would
be problematic, as we're inserting into hashtables all over the place and
not all of these places are exception-safe. Many places also make the
reasonable assumption that (normal) insert operations always succeed.

You are correct in saying that this approach would mean that you can now
easily trigger a fatal error in an application that was previously
susceptible to HashDos (and likely also in some that previously were not).
In most cases this should not be a problem, however for Aerys and likely
other daemons a fatal error implies losing all connections on the current
worker. This may effectively be worse than a HashDos attack.

The only thing I can think to offer here is to make the collision number
threshold ini configurable, so you would have the ability to effectively
disable this protection. In that case you'd have to manually perform size
checks in the right places if you wish to protect against HashDos.

I'm fine with having a solution that only works for the 99% of
applications, until such a time, if ever, as we can feasibly implement one
of the alternative schemes that would cover all possible scenarios.

Nikita


I don't know if anyone has suggested this before, but why not have a 
function that application developers can call to switch hash modes and 
support multiple hash modes in the core?


I've always viewed 'max_input_vars' as an emergency hack and I've run 
into the default 1,000 limit many times.  When I hit that limit, I 
inevitably have to raise it to anywhere from 3,000 to 10,000 to get the 
target application to function, which, of course, puts the whole server 
at risk.


If the application developer could declare what hash mode to use for new 
arrays, that could potentially solve the problem more cleanly.  SipHash 
would be used for security scenarios (PHP loading of superglobals, user 
input, etc.) and the current DJB hash for everything else.  That could, 
of course, lead to unfortunate situations.  Alternatively, default to 
SipHash for everything but the application developer could turn it off 
and on as needed at runtime.  Put a warning in the documentation about 
the security of hashes and also mention leaving SipHash enabled when in 
doubt.


--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Jakub Zelenka
Hi,

On Thu, Nov 26, 2015 at 5:24 PM, Nikita Popov  wrote:
>
>
> What are your thoughts on this?
>
>
First of all, thanks a lot for looking into it! That's great!

I think that it's all cool except the fact that json_decode would result in
fatal error.  I don't think that json_decode should kill the script in this
case. It's much better to return NULL and set a new JSON_ERROR_...

We could maybe add some EG var to silence it when using json_decode and set
it back when finished. In that case when the collision limit is exceeded it
would set another EG that could be checked in the json_parser.y as
UNEXPECTED  after each addition to the object or array. It's not probably
nicest solution and it would probably slow down slightly the json parser
but it could work and it's still better than fatal error IMHO.

What do you think?

Cheers

Jakub


Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Niklas Keller
Currently it's not catchable, that's my main concern. If it's catchable,
it's not that much of a problem.

Regards, Niklas

2015-11-27 10:05 GMT+01:00 Yasuo Ohgaki :

> Hi Nikita,
>
> On Fri, Nov 27, 2015 at 2:24 AM, Nikita Popov 
> wrote:
> > What are your thoughts on this?
>
> Great! This is exactly what I was thinking.
> I prefer collision counting rather than slower hash function.
>
> Hardcoded collision max (1000) seems ok for me.
> Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO.
>
> Regards,
>
> --
> Yasuo Ohgaki
> yohg...@ohgaki.net
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


RE: [PHP-DEV] HashDos protection

2015-11-27 Thread Anatol Belski
Hi Nikita,

> -Original Message-
> From: Nikita Popov [mailto:nikita@gmail.com]
> Sent: Thursday, November 26, 2015 6:25 PM
> To: PHP internals <internals@lists.php.net>; Anatol Belski
> <anatol@belski.net>; Remi Collet <r...@php.net>
> Subject: [PHP-DEV] HashDos protection
> 
> Hi internals!
> 
> This mail turned out to be rather long, so I'll start with a TL;DR:
> 
> To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
> parsing), I propose to introduce collision counting during hashtable insertion
> operations. This will throw a fatal error if the number of collisions during 
> an
> insertion operation exceed a certain threshold.
> 
> Implementation: https://github.com/php/php-src/pull/1565
> 
> From my testing the change has no negative performance impact. The change is
> small and does not break ABI.
> 
> Tracking bug (with various implementations):
> https://bugs.php.net/bug.php?id=70644
> 
> What are your thoughts on this?
> 
Responding to the short version as well :)

I was checking your patch and I think it is great. Currently I see no ABI 
breach (please correct me if I err). So IMHO after sufficient discussion, 
corrections and testing, given there's still no ABI incompatibility, it should 
be backported to 7.0 as early as possible. 

Regards

Anatol





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Thomas Hruska

On 11/27/2015 2:21 PM, Yasuo Ohgaki wrote:

Hi Thomas,

In practice, we wouldn't have problems with max number of collisions.


Is CLI going to be or can CLI be excluded from max collisions?  After 
thinking about it for a long while, that's my only area of concern here. 
 SAPI can (fatal) error to its heart's content and I'll find ways to 
live with it.  But CLI needs stability guarantees.  'max_input_vars' 
didn't affect CLI.  However, max collisions could negatively affect CLI 
because the change affects all arrays instead of just the superglobals.


Real-world scenario:  About once every 4 to 6 months I will write a 
script to load up a single PHP array with a few million records.  Partly 
because I can and partly because I need to.  On CLI, especially for my 
one-off quick-n-dirty scripts, I don't care how much CPU, RAM, and other 
resources are used nor how much time it takes to complete.  I just care 
that the script finishes running successfully.  If the script reads in a 
record and attempts to add it to the array, the proposed max collisions 
might trigger a fatal error if it hits the max collision limit for 
arrays.  My experience is that CLI is silent about 50% of the time when 
it encounters a fatal error.  So my script would drop back to the prompt 
after spending many minutes to hours loading the data, not having done 
any work, and not emit any error(s).  I would think that it had 
completed successfully until I went to look at the results and the 
results I would be expecting to see wouldn't be there.


I abuse PHP arrays.  Especially on CLI.  Sorry.

--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Yasuo Ohgaki
Hi Thomas,

On Sat, Nov 28, 2015 at 2:07 AM, Thomas Hruska  wrote:
> I don't know if anyone has suggested this before, but why not have a
> function that application developers can call to switch hash modes and
> support multiple hash modes in the core?
>
> I've always viewed 'max_input_vars' as an emergency hack and I've run into
> the default 1,000 limit many times.  When I hit that limit, I inevitably
> have to raise it to anywhere from 3,000 to 10,000 to get the target
> application to function, which, of course, puts the whole server at risk.

Because any hash functions have collisions.
Even if we use stronger hash against collisions, computers are getting
faster and faster, creating colliding key datatabease becomes easier and
easier. Clever person may find algolithmic way to generate colliding keys
in the future also.

In practice, we wouldn't have problems with max number of collisions.
Max number of collisions resolves the issue for good and we may execute
code faster with faster hash. I forgot the number but SipHash is much slower
than DJB.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Yasuo Ohgaki
Hi Nikita,

On Fri, Nov 27, 2015 at 2:24 AM, Nikita Popov  wrote:
> What are your thoughts on this?

Great! This is exactly what I was thinking.
I prefer collision counting rather than slower hash function.

Hardcoded collision max (1000) seems ok for me.
Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] HashDos protection

2015-11-27 Thread Leigh
On Thu, 26 Nov 2015 at 17:25 Nikita Popov  wrote:

> This will throw a fatal error if the number of
> collisions during an insertion operation exceed a certain threshold.
>

To me this feels like it's just moving a DoS vector from one place to
another.

As Niklas already pointed out, he is directly affected by this.

I was considering the scenario:
1) Open resources
2) json_decode
3) Do stuff
4) Clean up resources

This makes the DoS more application specific, but there's any number of
creative uses for making an application unexpectedly fail half way through.

You can argue it's similar to any DoS that causes a request to run out of
memory half way through, it is in some ways.

I don't think an exception is right for this either. People blindly catch
all exceptions because they can.

Not sure what to suggest.


[PHP-DEV] HashDos protection

2015-11-26 Thread Nikita Popov
Hi internals!

This mail turned out to be rather long, so I'll start with a TL;DR:

To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable
insertion operations. This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.

Implementation: https://github.com/php/php-src/pull/1565

>From my testing the change has no negative performance impact. The change
is small and does not break ABI.

Tracking bug (with various implementations):
https://bugs.php.net/bug.php?id=70644

What are your thoughts on this?

Nikita

-

For those not familiar with HashDos or interested in some considerations
regarding alternative implementations, the original mail:

In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
the form of max_input_vars. As a recap, HashDos exploits the fact that
hashtables (which PHP uses ubiquitously) perform very badly if the hashes
of many elements collide. In particular the performance of inserting N
elements into a hashtable can degrade from O(N) to O(N^2) using carefully
chosen keys.

A very simple demo script for creating a hashtable with many collisions:

$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }

This code creates an array with about 65k elements and runs in 30s (PHP
5.6) or 4s (PHP 7) on my machine.

This behavior can be exploited by an attacker whenever a server running PHP
can be made to create a decent-sized array (or object) with
attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
payload can easily take 5 CPU seconds to process on PHP 7 and from there it
goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).

The most obvious attack vector are GET and POST variables, which are
automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
by introducing max_input_vars, which prevents creating GET/POST arrays with
more than a certain number of elements. As the HashDos attack relies on
O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.

However, GET/POST are not the only ways an attacker can trigger
array/object creation with specific keys. For example many PHP applications
have endpoints that accept JSON encoded payloads, which are not protected
by max_input_vars. The only means of protection currently available to
userland code is to never decode JSON payloads that exceed some
conservative size limit like 50-100KB (often not practical).

Apart from JSON, there will likely be various other application-specific
situations where arrays are generated which contain user-provided keys.
While we could add something similar to max_input_vars to json_decode and
unserialize, this would not solve the problem for any custom code.

There are essentially three ways to prevent HashDos attacks in general
(rather than patching specific places):

1. If there are many collisions for a hash slot, switch collision
resolution from linked lists to self-balancing binary trees.
2. Switch hashtables to a keyed cryptographic hash like SipHash.
3. If there are very many collisions for a hash slot, throw a fatal error.

I will comment on these possibilities as they apply to PHP.

1. (Self-balancing binary trees). The idea here is that a balanced binary
tree has worst-case insertion time O(log N), while the linked list we
normally use has worst-case insertion time O(N). This means that the
worst-case execution time for N insertions into a hashtable goes from
O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
usual".

I don't think this solution is viable for PHP. Supporting binary trees next
to linked list collision resolution will make the hashtable implementation
*a lot* more complicated -- and it isn't exactly simple as is.

2. (Randomized cryptographic hash). This approach doesn't change anything
about how collisions are handled, instead it prevents the attacker from
constructing such collisions in the first place.

This is done by using a fast cryptographic hash function (typically
SipHash), which is keyed with a (cryptographically strong) random key. As
the attacker does not know the key, he cannot efficiently construct arrays
with many collisions.

There are a number of factors to take into account when trying to apply
this to PHP:

a) The SipHash key would have to be the same for all processes in an FPM
pool. We can't use a per-process or per-request key due to arrays living in
opcache SHM. As far as I know SipHash is designed to be secure even when
the used key is long living, so this does not appear to be an issue.

b) SipHash is slower (especially for very small strings) than the hash
function we currently use (DJBX33A). As PHP 7 caches the hashes for
strings, I do not believe this to be a significant problem.

c) The real issue: Currently PHP does not hash integer keys, but they are
just as susceptible to 

Re: [PHP-DEV] HashDos protection

2015-11-26 Thread Niklas Keller
>
> 3. (Fatal error on many collisions). While the two previous options try to
> ensure that hashtables stay efficient regardless of the used keys, the last
> option aims to simply detect malicious array keys and abort the script in
> such a case.
>
> This is done by counting the number of collisions during hashtable
> insertion operations and throwing a fatal error if this collisions count
> exceeds a certain threshold.
>

Would this be a catchable Error (implementing Throwable) or a real fatal?
Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
workers with carefully crafted input variables then.

Thanks, Niklas