Hi I'm building a spider using a regular expression extractor and a for-each- controller and works pretty well but..
I'm using <a href="[.]*/([^"]+)" as a expression extractor , and works well to extract links like: <a href="../rel/c/items" > <a href="/professions.html" but I can not find any expression that will work at the same time for expressions found in some sites like: <a href="http://www.mysite.es/index.php?main_page=page&id=20<http://www.mysite.es/index.php?main_page=page&id=20> " that include the full domain at the beginning (and has to be removed) It's a matter of working with the perl expression but after some days I could not manage to make it work, so any help will be appreciated Thanks
