Hi, I'm a nutch-newbie and am developing a search-based website.
How can I use Nutch to search for parameterized URLs? e.g. I want to search on an item called "xyz". The information on this item is available on http://www.somesite.com/somepage.jsp?id=someId where someId is the databaseId (generated by the host application) for item "xyz". I know that item "xyz" shows up with the above URL when I search using Google but it doesn't appear when I search for it using the sample web application provided with nutch. *Configuration:* I have configured the crawl-urlfilter.txt to : # accept hosts in MY.DOMAIN.NAME <http://my.domain.name/> *+^http://([a-z0-9]*\.)*somesite.com/* My *urls* folder contains a text file containing : *http://www.somesite.com*<http://www.somesite.com/> and I executed the command: *bin/nutch crawl urls -dir crawldir -depth 3* How can I get: http://www.somesite.com/somepage.jsp?id=someId when I search for "xyz" the same way it shows up during a Google search? Your help would be much appreciated, Rohit
