note that the for-each method is restricted to using a single thread which isnt ideal for a spider. http://theworkaholic.blogspot.com/2009/10/spidering-site-with-jmeter.html
On Wed, Sep 4, 2013 at 9:19 AM, Jordi Carretero <[email protected]>wrote: > Thanks Sebb, That vas very ilustrative for me and helped to find the > solution: > > <a href="(?:http://www\.mysite\.com)*[.]*/([^"]+) > > This expression to include in the regular expression extractor, extracts > the links in the pages, and can be used to populate the path field in the > recursive (for each controller) http request using a variable. > > To make php links working well I had to change though Response field to > check = body (unscaped) instead of Body (do not know really why :( > > Thanks again > Jordi > > > > > On Tue, Sep 3, 2013 at 8:36 PM, sebb <[email protected]> wrote: > > > On 3 September 2013 19:08, Jordi Carretero <[email protected]> > > wrote: > > > Hi > > > > > > I'm building a spider using a regular expression extractor and a > > for-each- > > > controller and works pretty well but.. > > > > > > I'm using <a href="[.]*/([^"]+)" as a expression extractor , and works > > well > > > to extract links like: > > > <a href="../rel/c/items" > > > > <a href="/professions.html" > > > > > > but I can not find any expression that will work at the same time for > > > expressions found in some sites like: > > > > > > <a href="http://www.mysite.es/index.php?main_page=page&id=20< > > http://www.mysite.es/index.php?main_page=page&id=20> > > > " > > > > > > that include the full domain at the beginning (and has to be removed) > > > > > > It's a matter of working with the perl expression but after some days I > > > could not manage to make it work, so any help will be appreciated > > > > If you want to ignore an optional string, use something like: > > > > (?:http://www\.mysite\.es)? > > > > The form (abc)? means abc or nothing; the (?:) form means don't save > > the contents. > > > > In your case, if you want to ignore both ".", ".." and > > "http:/www.mysite.es" you could use: > > > > (?:http://www\.mysite\.es|\.\.?)? > > > > BTW, rather than use "[.]" to escape the meta-character ".", the usual > > method is "\.". > > > > > Thanks > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > >
