On 4/30/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:JSP pages typically render HTML, so you don't need a JSP plugin, but anparse-html plugin in your nutch-site.xml Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ----From: Jasper Kamperman <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, April 30, 2008 1:32:29 PM Subject: Re: Searching parameterized URLsI think the first question is to figure out whether the page with URL http://www.somesite.com/somepage.jsp?id=someId even made it into yourindex. There are several ways to do this, personally I tend to use luke to have a look at the index, tell luke to open your nutch-0.9/ crawl/index directory (which is where it ends up if you follow the default instructions for running the crawl).If the page is in your index you can use luke to see what fields were extracted, hopefully there is some field named "foo" which would have"xyz" somewhere. The Nutch demo app should then find the page if you specify foo:xyz in the searchbar. If "foo" is one of "content", "title", "anchor" or "url" then the demo app should find it if youplainly search for xyz, no need to specify any of the default fields.Since it is a jsp page, it is entirely possible that you either don'thave the correct (jsp) plugin configured or that the plugin you have isn't smart enough to get the content out of a jsp page. Jasper On Apr 30, 2008, at 10:13 AM, Rohit Potnis wrote:Hi, I'm a nutch-newbie and am developing a search-based website. How can I use Nutch to search for parameterized URLs? e.g. I want to search on an item called "xyz". The information on this item is available on http://www.somesite.com/somepage.jsp?id=someId where someId is the databaseId (generated by the host application) for item "xyz". I know that item "xyz" shows up with the above URL when I search usingGoogle but it doesn't appear when I search for it using the sample webapplication provided with nutch. *Configuration:* I have configured the crawl-urlfilter.txt to : # accept hosts in MY.DOMAIN.NAME *+^http://([a-z0-9]*\.)*somesite.com/* My *urls* folder contains a text file containing : *http://www.somesite.com* and I executed the command: *bin/nutch crawl urls -dir crawldir - depth 3* How can I get: http://www.somesite.com/somepage.jsp?id=someId when I search for "xyz" the same way it shows up during a Google search? Your help would be much appreciated, Rohit
Sorry about that, I was confusing JSP with JavaScript, which requires
plugin parse-js .
- Re: Searching parameterized URLs Jasper Kamperman
- Re: Searching parameterized URLs ogjunk-nutch
- Re: Searching parameterized URLs Rohit Potnis
- Re: Searching parameterized URLs Rohit Potnis
- Re: Searching parameterized URLs Rohit Potnis
- Re: Searching parameterized URLs Jasper Kamperman
