Hello Leif, On Tuesday, April 20, 2004 at 20:13 GMT -0600, a stampede was started when Leif Gregory hollered:
It just occurred to me, if you're using PHP, can't you get all the info you need without using regexps? I seem to recall that there are built in functions that can get you pretty much any info you want, though I can't find it in the phpinfo() function... > I'm pretty sure the HTTP request type will always be first so I can > anchor with ^, but IIS likes to put the SERVER before the DATE, and > Sambar is the reverse. So, my thinking was that if I did the HTTP part > and then moved up to the first : and backed up to the first > whitespace, I could grab the next chunk (either DATE or SERVER) up to > the next : (and then back to the first whitespace), and continue that > until I hit the EOL. You can do that with the chunk that I wrote. What I don't know is how php handles regexps. I know in VBScript, when you do a regexp, all possible matches are stored in an array, so it is pretty easy to get out all the parts you want. In TB, that isn't the case, so I tend to forget about that option. If php will populate an array, then you're golden. The regexp could be fairly simple. JA>> (?i)(Date:\s*.*?)((\s\S+:\s)|\z) JA>> But increasing accuracy has the price of decreasing tolerance for JA>> errors in the string. > Exactly. The way I wrote the above regexp, you should be pretty accurate without losing any generality. > I didn't check an Apache server (I'll do that tomorrow) to > see how it outputs its HTTP headers. I am looking for something > generic, hence my hoping I could use the : as jump points to back up > from. If you really want to do that, you should use a look-ahead assertion. Something like: (\S*:\s*.*?)\s(?=\S*:\s) I haven't tried this in PHP, but in principle it should work. > Right. That shouldn't be a problem. I have a list of the atoms for PHP > and they are close to TB. Excellent. Do you mind sending me either a link or the list (off list if you like)? I was slowly learning some PHP stuff myself, so that could be very useful. > I had considered that (just doing multiple reg matches), but wondered > if there was a better way. It is a very small script, so it wouldn't > really kill the performance by doing multiple reg matches. Like I mentioned above, if PHP fills an array with all the matches, you get the best of both worlds. > So far, this one has always been first. It'll get ugly if it pops up > somewhere else on some strange webserver. Well then, it doesn't have to be hard, just use: ^(.*?)\s+(\S*:\s) > The order does change with exception to HTTP that I've discovered so > far anyways. Well, with multiple regexps, this isn't an issue. A single TB style match is more difficult with this restriction. The only way around it would be to use If..then statements, but the question becomes: which is worse? Running several matches, or processing the matches through a conditional cascade? > This might just be the best way to do it. It certainly is the easiest, though you will probably pay in performance if every clock cycle counts. > Yeah, but then I'd have to read at least ten posts telling me to > Google it. Like I really hadn't thought of that! <grin> <sigh> That's why we need TBPHP, TBEverything_Under_The_Sun. You'd be willing to moderate a few more lists, right? ;-) -- Thanks for writing, Januk Aggarwal ________________________________________________________ http://www.silverstones.com/thebat/TBUDLInfo.html
