ID: 27291 Updated by: [EMAIL PROTECTED] Reported By: php-bug-NOSPAM-2004 at ryandesign dot com Status: Open Bug Type: *General Issues Operating System: Mac OS X; FreeBSD; RedHat Linux PHP Version: 4.3.5RC3 New Comment:
I had done some work on the get_browser() function a while back, and I believe those were the last major changes to occur to that function. The function seemed to have been abandoned for quite some time before those changes. For the most part, based on the testing I did, the results seemed quite favourable, i.e. more information was now being returned by the function, such as operating systems and such that were previously missing from get_browser()'s output. Obviously there is still some room for improvement, though. I tried the original poster's patch using Gary's most up-to-date browscap.ini file and had some mixed results. I tested all of the unique user agent strings we had in our apache logs at work (1914 strings) and the results were sometimes better, sometimes worse, but overall they were pretty much the same. Here's a few things I noticed: - Netscape 7.x on Linux is better after the changes. (It was being reported as Mozilla 1.4 previously.) - Serveral versions of Mozilla on linux come up as Default Browser after the changes, as opposed to the correct information before the changes. - Something identified as a "Spoofed IE" is coming up correct before the changes, but comes up as Default Browser after the changes. - Epiphany 1.0 gets Default Browser after the changes, but comes up with "Mozilla 1.4" before the changes. - Some versions of Safari are being reported as Default Browser after the changes, while before the changes they seem to be coming up properly. (This includes the example in the original poster's example, which came up as Safari 1.1 on my system.) - Some versions of Galeon are being reported better after the changes. - Some user agents that were reported as being Website Strippers before are now being reported as Default Browser. You can find the results of the tests, the UA strings I used and the script to generate them here: http://216.94.11.234/browsers.tar.gz That's with an up-to-date PHP_4_3 checkout and the latest browscap.ini. To Gary: I'll take any suggestions on how to improve get_browser(). Is there any similar implementation that provides better results that I can get ahold of? I see things for IIS, Java, etc. on your site, but is any of them better than the rest that I should look at? J Previous Comments: ------------------------------------------------------------------------ [2004-02-19 07:12:52] php-bug-NOSPAM-2004 at ryandesign dot com Sorry, Gary; my bad. When using the Feb 15, 2004 browscap.ini, I had just looked at the browser match long enough to see that it found Safari, and did not look close enough to realize that the rule specifically matched the Safari v100 series. Due to one oddity in PHP's parsing code (a fix for which I provide through the commenting out of three lines, as seen in my diff), it ends up recognizing any Safari where the version number starts with 1, regardless of how many chars follow, which is why it recognized the fictitious version v1999 in my test case. I have now found the place where the PHP CVS snapshots are kept (http://snaps.php.net), and have downloaded and compiled the stable 4.3.x snapshot from Feb 19, 2004 10:30 GMT. Its behavior in relation to this bug remains unchanged when compared with the 4.3.4 and 4.3.5RC3 releases; it's still broken. I'd like to suggest that the PHP team reconsider evaluating and applying the diff I supplied in my original report. ------------------------------------------------------------------------ [2004-02-18 16:23:42] php_bug_27291 at garykeith dot com Respectfully, my latest browscap.ini does not detect all arbitrary versions of Safari. I'm not sure how you arrived at that conclusion. I do know that I receive e-mails nearly every day about this issue so there is obviously a problem somewhere. I don't know who is working on the code for get_browser() these days but I wish they would contact me so we could come to some sort of understanding about how to properly parse my file the way browscap.dll does. I am growing very weary of my files and efforts taking the blame for the non-stop stream of bugs that emanate from get_browser(). Thanks, ~gary. ------------------------------------------------------------------------ [2004-02-17 16:12:58] [EMAIL PROTECTED] Using latest stable CVS snapshot does match with "Default Browser".. ------------------------------------------------------------------------ [2004-02-17 14:01:22] php-bug-NOSPAM-2004 at ryandesign dot com Description: ------------ PHP's get_browser() function does not correctly use the patterns in the browscap.ini file, resulting in occasional incorrect matches. This occurred, for example, when Apple released Safari 1.2, and when OmniGroup released OmniWeb 5.0b1. These two browsers were then incorrectly identified as crawlers / robots, instead of being recognized as normal browsers. Instead of matching the last rule in the file (which has the browscap pattern "*" which PHP translates into the regular expression ".*"), it matches the rule for Website Strippers (which has the browscap pattern "Mozilla/5.0" which PHP translates to the regular expression "Mozilla/5\.0"). Yes, Safari and OmniWeb have "Mozilla/5.0" as part of their user agent string, but only part. "Mozilla/5.0" is not the ENTIRE UA string, which is what the browscap pattern is intending to define. Had the rule been intended to match "Mozilla/ 5.0" at the start of the string, regardless of what followed, the rule would have been written "Mozilla/ 5.0*". But it wasn't. PHP needs to anchor the regular expression it generates to the beginning and end of the string to ensure it is matching the portion of the string the browscap.ini author intended it to match. The regular expressions PHP should have generated are "^Mozilla/5\.0$" and "^.*$". Here is a diff of the PHP source code file ext/standard/browscap.c (from the version in the 4.3.4 release) which seems to correct the problem. The commenting out of lines 71 to 73 in the original file (73 to 75 in my version) is not essential and is not part of the fix for this issue, but was done because those lines seem to me to be another inaccuracy in PHP's browscap.ini parsing, and their removal does not seem to adversely affect the functioning of get_browser(), although I did not extensively test against many user agent strings, and I do not know the reason that code was originally inserted. 50c50 < t = (char *) malloc(Z_STRLEN_P(pattern)*2 + 1); --- > t = (char *) malloc(Z_STRLEN_P(pattern)*2 + 3); 52c52,54 < for (i=0, j=0; i<Z_STRLEN_P(pattern); i++, j++) { --- > t[0] = '^'; > > for (i=0, j=1; i<Z_STRLEN_P(pattern); i++, j++) { 71,73c73,75 < if (j && (t[j-1] == '.')) { < t[j++] = '*'; < } --- > // if (j && (t[j-1] == '.')) { > // t[j++] = '*'; > // } 74a77,78 > t[j++] = '$'; > Reproduce code: --------------- Install the browscap.ini file available from www.garykeith.com and modify the php.ini to use this file. Then run this: $ua = 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/1999 (KHTML, like Gecko) Safari/1999'; $ua_info = (array) get_browser($ua); print $ua; print '<pre>'; print_r($ua_info); print '</pre>'; Expected result: ---------------- The browscap.ini does not know about Safari version 1999. There is no such version; version 1.2 (125) is the most recent as of February 2004. And, at least in the version from a week or so ago, the browscap.ini does not define a generic "Safari" directive that would allow the browscap.ini to recognize it. So this user agent string should match the last rule in the file, "Default Browser", which has the pattern "*". Actual result: -------------- It actually matches the pattern "Mozilla/5.0", in the Website Strippers category. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=27291&edit=1