Hi

I'm building a spider using a regular expression extractor and a for-each-
controller and works pretty well but..

I'm using <a href="[.]*/([^"]+)" as a expression extractor , and works well
to extract links like:
<a href="../rel/c/items" >
<a href="/professions.html"

but I can not find any expression that will work at the same time for
expressions found in some sites like:

<a 
href="http://www.mysite.es/index.php?main_page=page&amp;id=20<http://www.mysite.es/index.php?main_page=page&id=20>
"

that include the full domain at the beginning (and has to be removed)

It's a matter of working with the perl expression but after some days I
could not manage to make it work, so any help will be appreciated

Thanks

Reply via email to