Re: Strip away all HTML, leaving just the URLs

2013-03-07 Thread John Delacour
On 06/03/2013 15:58, Nick wrote: Thanks, that did exactly what I was looking for. But, I realized I also need to do this for anchor tags with relative links, such as: a href=/xxx//zzz.shtmlordinateur de bureau/a A text filter something like this should do everything you want

Re: Strip away all HTML, leaving just the URLs

2013-03-07 Thread Doug McNutt
At 10:14 + 3/7/13, John Delacour wrote: On 06/03/2013 15:58, Nick wrote: Thanks, that did exactly what I was looking for. But, I realized I also need to do this for anchor tags with relative links, such as: a href=/xxx//zzz.shtmlordinateur de bureau/a A text filter something

Re: Strip away all HTML, leaving just the URLs

2013-03-07 Thread TJ Luoma
On Wed, Mar 6, 2013 at 10:58 AM, Nick grizfa...@gmail.com wrote: Thanks, that did exactly what I was looking for. But, I realized I also need to do this for anchor tags with relative links, such as: a href=/xxx//zzz.shtmlordinateur de bureau/a I'm going to re-post my previous

Re: Strip away all HTML, leaving just the URLs

2013-03-06 Thread Nick
message: *From: *Dmitry Markman dmar...@me.com javascript: *Subject: **Re: Strip away all HTML, leaving just the URLs* *Date: *March 1, 2013 10:21:06 PM EST *To: *bbe...@googlegroups.com javascript: On Sat, Mar 2, 2013 at 12:38 PM, Nick griz...@gmail.com javascript: wrote: I need to extract

Re: Strip away all HTML, leaving just the URLs

2013-03-03 Thread Ron Catterall
this will pick up anything like: img height=239 alt=trix_5 width=357 src=Thumbnails/4.jpg giving you 239 for example where do I check Copy to new document ? On 02/03/2013 11:00, LuKreme wrote: In our previous episode (Friday, 01-Mar-2013), Nick said: ul lia

Re: Strip away all HTML, leaving just the URLs

2013-03-03 Thread Dave
This might be better done on the command line. $ grep -Po '(?=href=)[^]+' [file name] This will give you the content of every href attribute in the file, and nothing else. Just a list of URLs. If there are any URLs you want to exclude, such as mailto:, javascript: or anchors (e.g.

Re: Strip away all HTML, leaving just the URLs

2013-03-03 Thread LuKreme
In our previous episode (Saturday, 02-Mar-2013), Ron Catterall said: On 02/03/2013 11:00, LuKreme wrote: In our previous episode (Friday, 01-Mar-2013), Nick said: ul lia href=http://www.youtube.com; class=youtubeYouTube/a/li lia href=http://www.facebook.com;

Re: Strip away all HTML, leaving just the URLs

2013-03-03 Thread Christopher Stone
On Mar 02, 2013, at 10:13, Dave dave.live...@gmail.com wrote: This might be better done on the command line. $ grep -Po '(?=href=)[^]+' [file name] __ That's not going to work on a stock Mountain Lion install (where the -P

Re: Strip away all HTML, leaving just the URLs

2013-03-02 Thread LuKreme
In our previous episode (Friday, 01-Mar-2013), Nick said: ul lia href=http://www.youtube.com; class=youtubeYouTube/a/li lia href=http://www.facebook.com; class=facebookFacebook/a/li lia href=http://www.twitter.com; class=twitterTwitter/a/li

Strip away all HTML, leaving just the URLs

2013-03-01 Thread Nick
Hi, I need to extract the URLs from a large number of HTML files. Basically, take something like this: ul lia href=http://www.youtube.com; class=youtubeYouTube/a/li lia href=http://www.facebook.com; class=facebookFacebook/a/li lia href=http://www.twitter.com; class=twitterTwitter/a/li /ul And

Re: Strip away all HTML, leaving just the URLs

2013-03-01 Thread Miraz Jordan
On Sat, Mar 2, 2013 at 12:38 PM, Nick grizfa...@gmail.com wrote: I need to extract the URLs from a large number of HTML files. Basically, take something like this: ul lia href=http://www.youtube.com; class=youtubeYouTube/a/li lia href=http://www.facebook.com; class=facebookFacebook/a/li

Fwd: Strip away all HTML, leaving just the URLs

2013-03-01 Thread Dmitry Markman
Begin forwarded message: From: Dmitry Markman dmark...@me.com Subject: Re: Strip away all HTML, leaving just the URLs Date: March 1, 2013 10:21:06 PM EST To: bbedit@googlegroups.com On Sat, Mar 2, 2013 at 12:38 PM, Nick grizfa...@gmail.com wrote: I need to extract the URLs from a large

Re: Strip away all HTML, leaving just the URLs

2013-03-01 Thread Robert A. Rosenberg
At 15:56 +1300 on 03/02/2013, Miraz Jordan wrote about Re: Strip away all HTML, leaving just the URLs: The inefficient way I'd do it is: 1] replace all with \r (puts each URL on its own line) 2] Text menu - Process lines containing http:// (put in a new document) Done. In lieu of step 2

Re: Strip away all HTML, leaving just the URLs

2013-03-01 Thread TJ Luoma
Here's a text filter which will do just that, using `lynx` which unfortunately is not installed in OS X by default, but you can acquire it either from Homebrew (my preference) or if you want a precompiled binary in a nice installer: http://rudix.googlecode.com/files/lynx-2.8.7-3.pkg TjL