Hi Januk, On Tue, 20 Apr 2004, at 15:16:34 [GMT -0700] (which was 4:16 PM where I live) you wrote: JA> I'll try. What really helps is if you can determine what parts of JA> your string are constant and which parts change. For example, the JA> Date part is probably always going to be: JA> Date: DAY, DD MMM YYYY HH:mm:ss GMT
I'm pretty sure the HTTP request type will always be first so I can anchor with ^, but IIS likes to put the SERVER before the DATE, and Sambar is the reverse. So, my thinking was that if I did the HTTP part and then moved up to the first : and backed up to the first whitespace, I could grab the next chunk (either DATE or SERVER) up to the next : (and then back to the first whitespace), and continue that until I hit the EOL. I just wasn't sure what I should do to get it started. JA> (?i)(Date:\s*.*?)((\s\S+:\s)|\z) JA> But increasing accuracy has the price of decreasing tolerance for JA> errors in the string. Exactly. I didn't check an Apache server (I'll do that tomorrow) to see how it outputs its HTTP headers. I am looking for something generic, hence my hoping I could use the : as jump points to back up from. JA> Note that I'm using TB specific atoms. You may have to modify the JA> syntax of these to work in PHP, I don't know. "\s" means any white JA> space character. "\S" means any non-whitespace character. "\z" is JA> end of subject (independent of multiline settings). The "(?i)" JA> just sets the regexp to be case-insensitive. I don't know if php JA> requires a different method for internal option setting. The "+" JA> means, match one or more characters of the preceding type. Right. That shouldn't be a problem. I have a list of the atoms for PHP and they are close to TB. JA> So in this case, your basic regexp would be: JA> (?i)(Date:\s*.*?)((\s\S*:\s*)|\z) JA> And you could just change the term "Date" to "Server", JA> "Last-modified", or "Connection" as necessary. The desired JA> information should be in subpattern 1 I had considered that (just doing multiple reg matches), but wondered if there was a better way. It is a very small script, so it wouldn't really kill the performance by doing multiple reg matches. JA> Now the HTTP one is a bit trickier. If you know that the HTTP JA> section is always first, and the next field is always the date JA> field, then your easiest bet is: So far, this one has always been first. It'll get ugly if it pops up somewhere else on some strange webserver. JA> That seems fairly reasonable. What would also work is if you JA> definitely know the order that the tokens will be listed in. Then JA> you could search for everything between two labels. The order does change with exception to HTTP that I've discovered so far anyways. JA> No need to do that if you do a few simple searches instead of one JA> complex one. This might just be the best way to do it. JA> But they would have a better shot at correct syntax... ;-) Yeah, but then I'd have to read at least ten posts telling me to Google it. Like I really hadn't thought of that! <grin> Thank you very much for the help. -- Cheers, Leif Gregory List Moderator (and fellow registered end-user) PCWize Editor / ICQ 216395 / PGP Key ID 0x7CD4926F Web Site <http://www.PCWize.com> TB FAQ <http://www.silverstones.com/thebat/FAQ.html> Using The Bat! 2.05 Beta/16 under Windows 2000 5.0 Build 2195 Service Pack 4 on a P4 1.6Ghz OC'd to 2.32Ghz with 512MB. Tagline of the day: A bad day: "Transfer completed (5720468 bytes, 56651 errors, 1 CPS)" ________________________________________________________ http://www.silverstones.com/thebat/TBUDLInfo.html
