ID: 27291 Comment by: php_bug_27291 at garykeith dot com Reported By: php-bug-NOSPAM-2004 at ryandesign dot com Status: Open Bug Type: *General Issues Operating System: * PHP Version: 4CVS, 5CVS (2004-02-20) New Comment:
Tell me what a .tgz file is and if I can do it I will. I'm working two new bugs that I hope someone will post bug reports about. The first deals with the exclamation point that's part of the new Yahoo! Slurp crawler. I'm not sure what PHP is doing since I don't speak PHP but I'm told it's throwing a parsing error. I've also had some complaints from people saying user agents aren't being recognized since I switched from using Gecko/???????? to Gecko/* as we discussed earlier. Previous Comments: ------------------------------------------------------------------------ [2004-02-23 12:58:19] php-bug-NOSPAM-2004 at ryandesign dot com Sounds like what we need is a script that takes the user agents logged by a popular web site, runs them through browscap.dll and PHP's get_browser(), compares the results, and informs a PHP programmer about any differences, so that get_browser() can be kept in line. I hate to volunteer for such a task, as I'm not a member of the PHP team, but perhaps, once my patch or one like it is applied, I could try it out for awhile to see what quantity of diffs is generated. I'm not sure, however, how I would use browscap.dll. What kind of server would need to be used, and what programming language? I have a feeling you're going to say Microsoft IIS and Visual Basic or ASP or some such, none of which I've ever worked with. (I wouldn't know where to begin.) But if a system could be developed whereby one popular web server gathers a day's UA's, and then passes it off to both a script using browscap.dll and a second one using PHP's get_browser(), and then takes the results, compares them, and saves out only the differences, and then emails or FTPs a .tgz to me or someone, that might be a place to start. I'd be happy to help with such a script, tho like I said I wouldn't know how to tackle the browscap.dll part. I looked through all the diffs between the PHP 4.3.4 get_browser() matching and the results after applying my patch on Jay's user agent file, and I'm encouraged by the results. I think they're more accurate. I think it should improve the situation greatly. Gary, if you'd like to send your huge 20,000-entry user agent list in a .tgz I could do some more comparisons and see how the patch holds up. ------------------------------------------------------------------------ [2004-02-23 12:01:41] php_bug_27291 at garykeith dot com Jay, whom can I contact at PHP to let them know I no longer want to be the official source of browscap.ini for PHP? In the past two days I've had two bug reports, both of which only happen in PHP. When browscap.dll is doing the parsing it works fine. I'm fed up with PHP. I waste more of my time dealing with bugs in PHP than any other aspect of my project and it's just not worth it to me anymore. ------------------------------------------------------------------------ [2004-02-19 15:10:07] php_bug_27291 at garykeith dot com Hi, Jay. I use Microsoft's browscap.dll as my baseline for testing. I don't know of any way you can legally get a look at the source code for that! I can't speak for the accuracy of the stuff I offer on my site for PHP or Java because I know nothing about either language. I work mostly in C++, C# and Visual Basic. The best I can tell you about that stuff is I've had almost no complaints about them beyond their inability to adapt to new properties in the browscap.ini file. BTW, the Spoofed IE is from the parent Spoofed User Agents. These are user agents that are almost like the real thing and I keep a close watch on them to see if they eventually need to be moved to the Website Strippers parent. Anyway, whatever it takes to get these bugs fixed, let's do it. Although I'm no longer willing to make major changes in my files to accommodate PHP perhaps my database of 22,000 user agents will let you do some more thorough testing. I'll also be willing to help in whatever other ways I can. You can reach me via the e-mail address I used here if you need to contact me privately. ------------------------------------------------------------------------ [2004-02-19 13:17:01] [EMAIL PROTECTED] I had done some work on the get_browser() function a while back, and I believe those were the last major changes to occur to that function. The function seemed to have been abandoned for quite some time before those changes. For the most part, based on the testing I did, the results seemed quite favourable, i.e. more information was now being returned by the function, such as operating systems and such that were previously missing from get_browser()'s output. Obviously there is still some room for improvement, though. I tried the original poster's patch using Gary's most up-to-date browscap.ini file and had some mixed results. I tested all of the unique user agent strings we had in our apache logs at work (1914 strings) and the results were sometimes better, sometimes worse, but overall they were pretty much the same. Here's a few things I noticed: - Netscape 7.x on Linux is better after the changes. (It was being reported as Mozilla 1.4 previously.) - Serveral versions of Mozilla on linux come up as Default Browser after the changes, as opposed to the correct information before the changes. - Something identified as a "Spoofed IE" is coming up correct before the changes, but comes up as Default Browser after the changes. - Epiphany 1.0 gets Default Browser after the changes, but comes up with "Mozilla 1.4" before the changes. - Some versions of Safari are being reported as Default Browser after the changes, while before the changes they seem to be coming up properly. (This includes the example in the original poster's example, which came up as Safari 1.1 on my system.) - Some versions of Galeon are being reported better after the changes. - Some user agents that were reported as being Website Strippers before are now being reported as Default Browser. You can find the results of the tests, the UA strings I used and the script to generate them here: http://216.94.11.234/browsers.tar.gz That's with an up-to-date PHP_4_3 checkout and the latest browscap.ini. To Gary: I'll take any suggestions on how to improve get_browser(). Is there any similar implementation that provides better results that I can get ahold of? I see things for IIS, Java, etc. on your site, but is any of them better than the rest that I should look at? J ------------------------------------------------------------------------ [2004-02-19 07:12:52] php-bug-NOSPAM-2004 at ryandesign dot com Sorry, Gary; my bad. When using the Feb 15, 2004 browscap.ini, I had just looked at the browser match long enough to see that it found Safari, and did not look close enough to realize that the rule specifically matched the Safari v100 series. Due to one oddity in PHP's parsing code (a fix for which I provide through the commenting out of three lines, as seen in my diff), it ends up recognizing any Safari where the version number starts with 1, regardless of how many chars follow, which is why it recognized the fictitious version v1999 in my test case. I have now found the place where the PHP CVS snapshots are kept (http://snaps.php.net), and have downloaded and compiled the stable 4.3.x snapshot from Feb 19, 2004 10:30 GMT. Its behavior in relation to this bug remains unchanged when compared with the 4.3.4 and 4.3.5RC3 releases; it's still broken. I'd like to suggest that the PHP team reconsider evaluating and applying the diff I supplied in my original report. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/27291 -- Edit this bug report at http://bugs.php.net/?id=27291&edit=1