On 4 September 2013 17:48, Jordi Carretero <[email protected]> wrote: > Correct, but I guess any other approach would be too complex for my limited > knowledge so far. Good thing is that I will never stress the sites being > spidered :) > > One update for the regular string, in case it helps anyone : > > <a href="(?:http://${__V(${URL})})*[.]*/([^"]+)
[.] should really be \. > (not my idea, extracted from > http://stackoverflow.com/questions/5341908/regex-extractor-equipped-with-dynamic-regular-expression-in-jmeter > ) > > where URL is the name of the variable I use for the sites to be spidered. I > fill the variable using a csv file and CSV Data Set Config > > About the multithread, I was thinking to launch several jmeter instances, > each one loaded with its own CSV full os sites. Not ideal but I'll get some > perf. improvement . Why not just use multiple JMeter threads? Each one will get a different CSV entry (unless you specify otherwise). > Jordi > > > > > On Wed, Sep 4, 2013 at 6:24 PM, Deepak Shetty <[email protected]> wrote: > >> note that the for-each method is restricted to using a single thread which >> isnt ideal for a spider. >> http://theworkaholic.blogspot.com/2009/10/spidering-site-with-jmeter.html >> >> >> On Wed, Sep 4, 2013 at 9:19 AM, Jordi Carretero <[email protected] >> >wrote: >> >> > Thanks Sebb, That vas very ilustrative for me and helped to find the >> > solution: >> > >> > <a href="(?:http://www\.mysite\.com)*[.]*/([^"]+) >> > >> > This expression to include in the regular expression extractor, extracts >> > the links in the pages, and can be used to populate the path field in the >> > recursive (for each controller) http request using a variable. >> > >> > To make php links working well I had to change though Response field to >> > check = body (unscaped) instead of Body (do not know really why :( >> > >> > Thanks again >> > Jordi >> > >> > >> > >> > >> > On Tue, Sep 3, 2013 at 8:36 PM, sebb <[email protected]> wrote: >> > >> > > On 3 September 2013 19:08, Jordi Carretero <[email protected]> >> > > wrote: >> > > > Hi >> > > > >> > > > I'm building a spider using a regular expression extractor and a >> > > for-each- >> > > > controller and works pretty well but.. >> > > > >> > > > I'm using <a href="[.]*/([^"]+)" as a expression extractor , and >> works >> > > well >> > > > to extract links like: >> > > > <a href="../rel/c/items" > >> > > > <a href="/professions.html" >> > > > >> > > > but I can not find any expression that will work at the same time for >> > > > expressions found in some sites like: >> > > > >> > > > <a href="http://www.mysite.es/index.php?main_page=page&id=20< >> > > http://www.mysite.es/index.php?main_page=page&id=20> >> > > > " >> > > > >> > > > that include the full domain at the beginning (and has to be removed) >> > > > >> > > > It's a matter of working with the perl expression but after some >> days I >> > > > could not manage to make it work, so any help will be appreciated >> > > >> > > If you want to ignore an optional string, use something like: >> > > >> > > (?:http://www\.mysite\.es)? >> > > >> > > The form (abc)? means abc or nothing; the (?:) form means don't save >> > > the contents. >> > > >> > > In your case, if you want to ignore both ".", ".." and >> > > "http:/www.mysite.es" you could use: >> > > >> > > (?:http://www\.mysite\.es|\.\.?)? >> > > >> > > BTW, rather than use "[.]" to escape the meta-character ".", the usual >> > > method is "\.". >> > > >> > > > Thanks >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: [email protected] >> > > For additional commands, e-mail: [email protected] >> > > >> > > >> > >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
