[SLUG] Extracting URL's from a web page

2007-07-22 Thread Sean Murphy

All.

I wish to extract specific links from a web page.  A part of the requirement 
is to be able to drill down about three to four levels to extract the 
information.


The first page shall have 26 to 30 links I want to extract.  The levels 
beneath the first page is a lot higher.


I only want the text, not the underlying HTML code, but if I get the URL 
path I can deal.


Wget grabs to much information for my use.  I only know the higher levels of 
Perl and I am starting to learn Ruby.  So my coding under Linux is not very 
advance at all.



Sean Murphy
Skype: smurf20005

Life is a challenge, treat it that way. 


--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Extracting URL's from a web page

2007-07-22 Thread Zhasper

Once more, this time to list

http://aspn.activestate.com/ASPN/docs/ActivePython/2.5/diveintopython/html/html_processing/extracting_data.html
has some sample recipes that should give you a good starting point.

On 22/07/07, Sean Murphy [EMAIL PROTECTED] wrote:

All.

I wish to extract specific links from a web page.  A part of the requirement
is to be able to drill down about three to four levels to extract the
information.

The first page shall have 26 to 30 links I want to extract.  The levels
beneath the first page is a lot higher.

I only want the text, not the underlying HTML code, but if I get the URL
path I can deal.

Wget grabs to much information for my use.  I only know the higher levels of
Perl and I am starting to learn Ruby.  So my coding under Linux is not very
advance at all.


Sean Murphy
Skype: smurf20005

Life is a challenge, treat it that way.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html





--
There is nothing more worthy of contempt than a man who quotes himself
- Zhasper, 2004
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html