Re: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Lukas Kahwe Smith


On 21.07.2008, at 17:29, Michal Dziemianko wrote:


Hello,
I have looked into Zend/zend_hash.h and I guess it might be sped up  
a little. So far it uses D. Bernstein's hash which is quite fast,  
but I think it might be worth replacing it with MurmurHash. I have  
tried comparison of speed for them (both as separate C programs, and  
compiled into PHP 5_3). Results for REAL keys (collected on running  
web server) are at the bottom of this page: http://212.85.117.53/gsoc/index.php?option=com_contentview=articleid=65:hash-functions-for-hash-tablescatid=34:profilingItemid=54



speeding up array's does indeed look like a place where one could give  
a nice boost to many php applications. however it is also a place that  
could create such as many problems of course. so getting this into 5.3  
seems a bit unlikely at this point given that the feature freeze is  
planned for the 24th.


that being said, on the page i only see benchmarks with artificial  
data/code. did you also run some benchmarks again some popular php  
applications so see what the real world benefits are in terms of  
requests per second?


regards,
Lukas Kahwe Smith
[EMAIL PROTECTED]




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Andi Gutmans
Hi Michal,

Can you please send a link to the patch so we can review? I didn't get
the attachment.

Thanks,
Andi




 -Original Message-
 From: Michal Dziemianko [mailto:[EMAIL PROTECTED]
 Sent: Monday, July 21, 2008 8:29 AM
 To: internals@lists.php.net
 Subject: [PHP-DEV] zend_inline_hash_function reimplementation
 
 Hello,
 I have looked into Zend/zend_hash.h and I guess it might be sped up a
little.
 So far it uses D. Bernstein's hash which is quite fast, but I think it
might
 be worth replacing it with MurmurHash. I have tried comparison of
speed for
 them (both as separate C programs, and compiled into PHP 5_3). Results
for
 REAL keys (collected on running web server) are at the bottom of this
page:
 http://212.85.117.53/gsoc/
 index.php?option=com_contentview=articleid=65:hash-functions-for-
 hash-tablescatid=34:profilingItemid=54
 
 Murmur is Public Domain, and its author is maintaining it all the time
so I
 guess it will be faster and better. There is comparison of
distribution at
 both the page I gave and here: http:// murmurhash.googlepages.com/
 
 The patch for 5_3 (applicable to HEAD) as attachment

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Scott MacVicar

Hi Andi,

The patch is attached for 5_3.

I've got some time allocated tomorrow to review all of Michal's patches 
that have been produced for the GSoC. I'll try to post some figures from 
real life apps.


Scott
Andi Gutmans wrote:

Hi Michal,

Can you please send a link to the patch so we can review? I didn't get
the attachment.

Thanks,
Andi





-Original Message-
From: Michal Dziemianko [mailto:[EMAIL PROTECTED]
Sent: Monday, July 21, 2008 8:29 AM
To: internals@lists.php.net
Subject: [PHP-DEV] zend_inline_hash_function reimplementation

Hello,
I have looked into Zend/zend_hash.h and I guess it might be sped up a

little.

So far it uses D. Bernstein's hash which is quite fast, but I think it

might

be worth replacing it with MurmurHash. I have tried comparison of

speed for

them (both as separate C programs, and compiled into PHP 5_3). Results

for

REAL keys (collected on running web server) are at the bottom of this

page:

http://212.85.117.53/gsoc/
index.php?option=com_contentview=articleid=65:hash-functions-for-
hash-tablescatid=34:profilingItemid=54

Murmur is Public Domain, and its author is maintaining it all the time

so I

guess it will be faster and better. There is comparison of

distribution at

both the page I gave and here: http:// murmurhash.googlepages.com/

The patch for 5_3 (applicable to HEAD) as attachment




Index: zend_hash.h
===
RCS file: /repository/ZendEngine2/zend_hash.h,v
retrieving revision 1.78.2.2.2.2.2.4
diff -u -d -r1.78.2.2.2.2.2.4 zend_hash.h
--- zend_hash.h 29 Apr 2008 08:15:17 -  1.78.2.2.2.2.2.4
+++ zend_hash.h 20 Jul 2008 12:37:41 -
@@ -222,65 +222,47 @@
 ZEND_API int zend_hash_rehash(HashTable *ht);
 
 /*
- * DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
- *
- * This is Daniel J. Bernstein's popular `times 33' hash function as
- * posted by him years ago on comp.lang.c. It basically uses a function
- * like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
- * known hash functions for strings. Because it is both computed very
- * fast and distributes very well.
- *
- * The magic of number 33, i.e. why it works better than many other
- * constants, prime or not, has never been adequately explained by
- * anyone. So I try an explanation: if one experimentally tests all
- * multipliers between 1 and 256 (as RSE did now) one detects that even
- * numbers are not useable at all. The remaining 128 odd numbers
- * (except for the number 1) work more or less all equally well. They
- * all distribute in an acceptable way and this way fill a hash table
- * with an average percent of approx. 86%. 
- *
- * If one compares the Chi^2 values of the variants, the number 33 not
- * even has the best value. But the number 33 and a few other equally
- * good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
- * advantage to the remaining numbers in the large set of possible
- * multipliers: their multiply operation can be replaced by a faster
- * operation based on just one shift plus either a single addition
- * or subtraction operation. And because a hash function has to both
- * distribute good _and_ has to be very fast to compute, those few
- * numbers should be preferred and seems to be the reason why Daniel J.
- * Bernstein also preferred it.
- *
- *
- *  -- Ralf S. Engelschall [EMAIL PROTECTED]
+ * MURMUR Hash - http://murmurhash.googlepages.com/
  */
-
-static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength)
+#define MURMUR_SEED 5831
+#define MURMUR_M 0x5bd1e995
+#define MURMUR_R 24
+static inline ulong zend_inline_hash_func(const char *key, uint len)
 {
-   register ulong hash = 5381;
+   unsigned int h = MURMUR_SEED ^ len;
+   unsigned char * data = (unsigned char *)key;
 
-   /* variant with the hash unrolled eight times */
-   for (; nKeyLength = 8; nKeyLength -= 8) {
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   hash = ((hash  5) + hash) + *arKey++;
-   }
-   switch (nKeyLength) {
-   case 7: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 6: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 5: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 4: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 3: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 2: hash = ((hash  5) + hash) + *arKey++; /* 
fallthrough... */
-   case 1: hash = ((hash  5) + hash) + *arKey++; break

RE: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Andi Gutmans
Thanks. Is there also a 64bit variant of this algorithm? That could
deliver a substantial benefit.

Andi

 -Original Message-
 From: Scott MacVicar [mailto:[EMAIL PROTECTED]
 Sent: Monday, July 21, 2008 11:15 AM
 To: Andi Gutmans
 Cc: Michal Dziemianko; internals@lists.php.net
 Subject: Re: [PHP-DEV] zend_inline_hash_function reimplementation
 
 Hi Andi,
 
 The patch is attached for 5_3.
 
 I've got some time allocated tomorrow to review all of Michal's
patches that
 have been produced for the GSoC. I'll try to post some figures from
real life
 apps.
 
 Scott
 Andi Gutmans wrote:
  Hi Michal,
 
  Can you please send a link to the patch so we can review? I didn't
get
  the attachment.
 
  Thanks,
  Andi
 
 
 
 
  -Original Message-
  From: Michal Dziemianko [mailto:[EMAIL PROTECTED]
  Sent: Monday, July 21, 2008 8:29 AM
  To: internals@lists.php.net
  Subject: [PHP-DEV] zend_inline_hash_function reimplementation
 
  Hello,
  I have looked into Zend/zend_hash.h and I guess it might be sped up
a
  little.
  So far it uses D. Bernstein's hash which is quite fast, but I think
  it
  might
  be worth replacing it with MurmurHash. I have tried comparison of
  speed for
  them (both as separate C programs, and compiled into PHP 5_3).
  Results
  for
  REAL keys (collected on running web server) are at the bottom of
this
  page:
  http://212.85.117.53/gsoc/
  index.php?option=com_contentview=articleid=65:hash-functions-for-
  hash-tablescatid=34:profilingItemid=54
 
  Murmur is Public Domain, and its author is maintaining it all the
  time
  so I
  guess it will be faster and better. There is comparison of
  distribution at
  both the page I gave and here: http:// murmurhash.googlepages.com/
 
  The patch for 5_3 (applicable to HEAD) as attachment
 


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Stanislav Malyshev

Hi!

I understand this patch has potential portability issues for 
architectures that won't be able to read int's unaligned.


Maybe we could keep both of them and do some configure test to see if it 
works fine, if not - use the old one?


Also I understand that this function returns uint while old hash 
returned ulong. What happens with that on 64-bit platforms?

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] zend_inline_hash_function reimplementation

2008-07-21 Thread Scott MacVicar
There is an aligned version of the algorithm available but its slower,  
there is also a 64-bit version in the works. I emailed the author  
about its progress to check.


At the moment adding both looks the way forward, i'll benchmark a PPC  
version of the algorithm shortly.


Scott
On 21 Jul 2008, at 21:58, Stanislav Malyshev wrote:


Hi!

I understand this patch has potential portability issues for  
architectures that won't be able to read int's unaligned.


Maybe we could keep both of them and do some configure test to see  
if it works fine, if not - use the old one?


Also I understand that this function returns uint while old hash  
returned ulong. What happens with that on 64-bit platforms?

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php