nutch and page parameters

Antonios Katsikadamos Tue, 12 Oct 2010 06:42:45 -0700

Hi i have the code

     org.apache.hadoop.conf.Configuration conf =
NutchConfiguration.create();
     NutchBean bean = new NutchBean(conf);
     Query query = Query.parse(search_term,conf);
     Hits hits = bean.search(query, NUM_HITS);


 for (int i = 0; i < hits.getLength(); i++) {
                  Hit hit = hits.getHit(i);

                  HitDetails details = bean.getDetails(hit);

               <h1>
                 <a
href="<%=details.getValue("url")%>"><%=details.getValue("title")%></a>
                </h1>

                <%=bean.getSummary(details, query).toString()%>
}

with this way the crawler fetches the pages

    /page.jsp?language_id='en'&item_id='1111'

          and

   /page.jsp?item_id='1111'&language_id='en'

which are the same. How can i avoid it;

Thanks a lot.

nutch and page parameters

Reply via email to