Correct, but I guess any other approach would be too complex for my limited knowledge so far. Good thing is that I will never stress the sites being spidered :)
One update for the regular string, in case it helps anyone : <a href="(?:http://${__V(${URL})})*[.]*/([^"]+) (not my idea, extracted from http://stackoverflow.com/questions/5341908/regex-extractor-equipped-with-dynamic-regular-expression-in-jmeter ) where URL is the name of the variable I use for the sites to be spidered. I fill the variable using a csv file and CSV Data Set Config About the multithread, I was thinking to launch several jmeter instances, each one loaded with its own CSV full os sites. Not ideal but I'll get some perf. improvement . Jordi On Wed, Sep 4, 2013 at 6:24 PM, Deepak Shetty <[email protected]> wrote: > note that the for-each method is restricted to using a single thread which > isnt ideal for a spider. > http://theworkaholic.blogspot.com/2009/10/spidering-site-with-jmeter.html > > > On Wed, Sep 4, 2013 at 9:19 AM, Jordi Carretero <[email protected] > >wrote: > > > Thanks Sebb, That vas very ilustrative for me and helped to find the > > solution: > > > > <a href="(?:http://www\.mysite\.com)*[.]*/([^"]+) > > > > This expression to include in the regular expression extractor, extracts > > the links in the pages, and can be used to populate the path field in the > > recursive (for each controller) http request using a variable. > > > > To make php links working well I had to change though Response field to > > check = body (unscaped) instead of Body (do not know really why :( > > > > Thanks again > > Jordi > > > > > > > > > > On Tue, Sep 3, 2013 at 8:36 PM, sebb <[email protected]> wrote: > > > > > On 3 September 2013 19:08, Jordi Carretero <[email protected]> > > > wrote: > > > > Hi > > > > > > > > I'm building a spider using a regular expression extractor and a > > > for-each- > > > > controller and works pretty well but.. > > > > > > > > I'm using <a href="[.]*/([^"]+)" as a expression extractor , and > works > > > well > > > > to extract links like: > > > > <a href="../rel/c/items" > > > > > <a href="/professions.html" > > > > > > > > but I can not find any expression that will work at the same time for > > > > expressions found in some sites like: > > > > > > > > <a href="http://www.mysite.es/index.php?main_page=page&id=20< > > > http://www.mysite.es/index.php?main_page=page&id=20> > > > > " > > > > > > > > that include the full domain at the beginning (and has to be removed) > > > > > > > > It's a matter of working with the perl expression but after some > days I > > > > could not manage to make it work, so any help will be appreciated > > > > > > If you want to ignore an optional string, use something like: > > > > > > (?:http://www\.mysite\.es)? > > > > > > The form (abc)? means abc or nothing; the (?:) form means don't save > > > the contents. > > > > > > In your case, if you want to ignore both ".", ".." and > > > "http:/www.mysite.es" you could use: > > > > > > (?:http://www\.mysite\.es|\.\.?)? > > > > > > BTW, rather than use "[.]" to escape the meta-character ".", the usual > > > method is "\.". > > > > > > > Thanks > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > >
