Hi Roman... umm no. assume you have a web page, and the page has a form on it. within the form, there might be multiple elements (lists/select statements, etc...). each item would have a varname, which would in turn be used as part of the form action, to create the entire query...
sort of like: form action=test.php? option name=foo foo=1 foo=2 foo=3 foo=4 /option option name=cat cat=1 cat=2 cat=3 /option /form so you'd get the following urls in this psuedo example: test.php?foo=1&cat=1 test.php?foo=1&cat=2 test.php?foo=1&cat=3 test.php?foo=2&cat=1 test.php?foo=2&cat=2 test.php?foo=2&cat=3 test.php?foo=3&cat=1 test.php?foo=3&cat=2 test.php?foo=3&cat=3 test.php?foo=4&cat=1 test.php?foo=4&cat=2 test.php?foo=4&cat=3 with this, the app can then continue to crawl the pages. so, i'm looking for some sort of crawler that already does this kind of analysis within the page. i know i can create a python/perl script for a sing site/page.. but since i'm looking at 100s of sites... this is why i'm asking about nutch/lucene/solr... thanks -----Original Message----- From: brainstorm [mailto:[EMAIL PROTECTED] Sent: Thursday, August 14, 2008 3:12 PM To: [email protected] Subject: Re: lucene/nutch question... If I understand correctly, you are looking for a way to test/fill forms... if that's the case, I recommend the following tools: http://wtr.rubyforge.org/ http://search.cpan.org/~petdance/WWW-Mechanize-1.34/lib/WWW/Mechanize.pm But I guess that with coding effort, nutch can also archieve what you want. Regards, Roman On Thu, Aug 14, 2008 at 11:51 PM, bruce <[EMAIL PROTECTED]> wrote: > Hi. > > Got a very basic lucene/nutch question. > > Assume I have a page that has a form. Within the form are a number of > select/drop-down boxes/etc... In this case, each object would comprise a > variable which would form part of the query string as defined in the form > action. Is there a way for lucene/nutch to go through the process of > building up the actions based on the querystring vars, so that lucene/nutch > can actually search through each possible combination of urls.... > > Also, is nutch/lucene the right/correct app to use in this scenario? Is > there a better app to handle this kind of potential application/process. > > Thanks > > -bruce > > > > > > >
