Re: [PHP] Regex Help for URL's [ANSWER]
On May 16, 2006, at 7:53 PM, Chrome wrote: -Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:42 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] That's what I was doing. I was parsing A:HREF, IMG:SRC, etc. But when I implemented a new feature on my network, where you could click on a row and have it take you to another domain, I need a better solution. Go to http://www.enetwizard.ws and it might make more sense. All the links on the left have an ONCLICK=location.href = '' attribute in the TR tag. This solution allowed me to make sure those links included the session information, just like the A:HREF links do. It also had the advantage of updating the links in my CSS. O that breaks accessibility standards! Compliment the 'onclick's with onkeydown at least :) But still you get a solid onclick=... scenario If these are visible in the source then they are fairly easy to pick out Though you may need more than 1 regex ;) My complaint here is, don't break accessibility :) And don't forget the folks who have javascript turned off or are using text based browsers too. Ed -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
In case any one is looking for a solution to a similar problem as me, here is the answer. I used the code from my original post as my guiding light, and with some experimentation, I figured it out. To get any URL, regardless of where it is located, use this: preg_match_all(#\'http://(.*)\'#U, $content, $matches); This match anything similar to: 'http://www.domain.com/dir/dir/file.txt?query=blah' This is useful, if for example, you have a tag like this one: A HREF=javascript:void(0); ONCLICK=javascript:window.open = 'http://www.domain.com/dir/dir/file.txt?query=blah'; Now, for tags which are in quotes, rather than single quotes, just use: preg_match_all(#\http://(.*)\#U, $content, $matches); This is really only the first step. In order to be useful, you need a way to process these urls according to your own specific needs: preg_match_all(#\'http://(.*)\'#U, $content, $matches); $content = preg_replace(#\'http://(.*)\'#U, '###URL###', $content); This will modify the $content variable to change all urls to ###URL### You can then go through them one at a time to process them: for ($count = 0; $count count($matches[1]); $count++) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
On Tue, 2006-05-16 at 18:49, Robert Samuel White wrote: In case any one is looking for a solution to a similar problem as me, here is the answer. I used the code from my original post as my guiding light, and with some experimentation, I figured it out. To get any URL, regardless of where it is located, use this: preg_match_all(#\'http://(.*)\'#U, $content, $matches); This match anything similar to: 'http://www.domain.com/dir/dir/file.txt?query=blah' This is useful, if for example, you have a tag like this one: A HREF=javascript:void(0); ONCLICK=javascript:window.open = 'http://www.domain.com/dir/dir/file.txt?query=blah'; Now, for tags which are in quotes, rather than single quotes, just use: preg_match_all(#\http://(.*)\#U, $content, $matches); I'd roll those two into one expression: preg_match_all(#(\|')http://(.*)(\|')#U, $content, $matches); Cheers, Rob. -- .. | InterJinn Application Framework - http://www.interjinn.com | :: | An application and templating framework for PHP. Boasting | | a powerful, scalable system for accessing system services | | such as forms, properties, sessions, and caches. InterJinn | | also provides an extremely flexible architecture for | | creating re-usable components quickly and easily. | `' -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
On Tue, May 16, 2006 6:21 pm, Robert Cummings wrote: On Tue, 2006-05-16 at 18:49, Robert Samuel White wrote: In case any one is looking for a solution to a similar problem as me, here preg_match_all(#(\|')http://(.*)(\|')#U, $content, $matches); And it's missing the original requirement of matching https URLs, so maybe make it be ...https?://... Plus, http could be IN CAPS, so change the U to iU And, actually, SOME old-school HTML pages will have neither ' nor around the URL, and are (or were) valid: href=page2.html was considered valid for HTML for a long long long time So toss in (\|')? And then you may be finding URLs that are not actually linked but are part of the visible content, so maybe you only want the ones that have a[^]href= in front of them. If I can toss off 3 problems without even trying... So I still think Google or searching the archives (as I suggested off-list) will be the quickest route to a CORRECT answer, but here we are again in this same thread we've been in every month or so for the better part of a decade... PS the (\|') bit may move the URLs into $matches[2] instead of $matches[1] or whatever. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
All pages used by my content management system must be in a valid format. Old-school style pages are never created so the solution I have come up with is perfect for my needs. Thank you. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
-Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:16 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] All pages used by my content management system must be in a valid format. Old-school style pages are never created so the solution I have come up with is perfect for my needs. Thank you. Doesn't that make it a proprietary solution? IMHO offering the regex may create a false situation for people... So the answer may not be for everyone Might be wrong :) Dan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
In my opinion, it is the most reasonable solution. I have looked all over the web for something else, but this works perfectly for me. It's impossible to tell where an url starts and ends if you don't have it in quotes or single quotes. If someone really needs to find all the urls in a page, then they'll code their pages to make use of this limitation. -Original Message- From: Chrome [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 8:24 PM To: 'Robert Samuel White'; php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] -Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:16 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] All pages used by my content management system must be in a valid format. Old-school style pages are never created so the solution I have come up with is perfect for my needs. Thank you. Doesn't that make it a proprietary solution? IMHO offering the regex may create a false situation for people... So the answer may not be for everyone Might be wrong :) Dan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
-Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:28 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] In my opinion, it is the most reasonable solution. I have looked all over the web for something else, but this works perfectly for me. It's impossible to tell where an url starts and ends if you don't have it in quotes or single quotes. If someone really needs to find all the urls in a page, then they'll code their pages to make use of this limitation. -Original Message- From: Chrome [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 8:24 PM To: 'Robert Samuel White'; php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] -Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:16 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] All pages used by my content management system must be in a valid format. Old-school style pages are never created so the solution I have come up with is perfect for my needs. Thank you. Doesn't that make it a proprietary solution? IMHO offering the regex may create a false situation for people... So the answer may not be for everyone Might be wrong :) Dan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php __ NOD32 1.1542 (20060516) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com If we are talking clickable links, why not focus on the a construct itself? Otherwise URLs are just part of the page's textual content... Very difficult to parse that Disseminating an a tag isn't brain-meltingly difficult with a regex if you put your mind to it... With or without quotes, be they single, double or non-existent If I've misunderstood please chastise me :) HTH Dan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
If we are talking clickable links, why not focus on the a construct itself? Otherwise URLs are just part of the page's textual content... Very difficult to parse that Disseminating an a tag isn't brain-meltingly difficult with a regex if you put your mind to it... With or without quotes, be they single, double or non-existent If I've misunderstood please chastise me :) HTH Dan Dan, That's what I was doing. I was parsing A:HREF, IMG:SRC, etc. But when I implemented a new feature on my network, where you could click on a row and have it take you to another domain, I need a better solution. Go to http://www.enetwizard.ws and it might make more sense. All the links on the left have an ONCLICK=location.href = '' attribute in the TR tag. This solution allowed me to make sure those links included the session information, just like the A:HREF links do. It also had the advantage of updating the links in my CSS. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Regex Help for URL's [ANSWER]
-Original Message- From: Robert Samuel White [mailto:[EMAIL PROTECTED] Sent: 17 May 2006 01:42 To: php-general@lists.php.net Subject: RE: [PHP] Regex Help for URL's [ANSWER] If we are talking clickable links, why not focus on the a construct itself? Otherwise URLs are just part of the page's textual content... Very difficult to parse that Disseminating an a tag isn't brain-meltingly difficult with a regex if you put your mind to it... With or without quotes, be they single, double or non-existent If I've misunderstood please chastise me :) HTH Dan Dan, That's what I was doing. I was parsing A:HREF, IMG:SRC, etc. But when I implemented a new feature on my network, where you could click on a row and have it take you to another domain, I need a better solution. Go to http://www.enetwizard.ws and it might make more sense. All the links on the left have an ONCLICK=location.href = '' attribute in the TR tag. This solution allowed me to make sure those links included the session information, just like the A:HREF links do. It also had the advantage of updating the links in my CSS. O that breaks accessibility standards! Compliment the 'onclick's with onkeydown at least :) But still you get a solid onclick=... scenario If these are visible in the source then they are fairly easy to pick out Though you may need more than 1 regex ;) My complaint here is, don't break accessibility :) Dan -- http://chrome.me.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php