Hi,

I'm a nutch-newbie and am developing a search-based website.

How can I use Nutch to search for parameterized URLs?

e.g. I want to search on an item called "xyz". The information on this item
is available on http://www.somesite.com/somepage.jsp?id=someId
where someId is the databaseId (generated by the host application) for item
"xyz".

 I know that item "xyz" shows up with the above URL when I search using
Google but it doesn't appear when I search for it using the sample web
application provided with nutch.

*Configuration:*

I have configured the crawl-urlfilter.txt to :

# accept hosts in MY.DOMAIN.NAME <http://my.domain.name/>
*+^http://([a-z0-9]*\.)*somesite.com/*

My *urls* folder contains a text file containing :
*http://www.somesite.com*<http://www.somesite.com/>

and I executed the command: *bin/nutch crawl urls -dir crawldir -depth 3*

How can I get: http://www.somesite.com/somepage.jsp?id=someId when I search
for "xyz" the same way it shows up during a Google search?

Your help would be much appreciated,
Rohit

Reply via email to