Hi i have the code
org.apache.hadoop.conf.Configuration conf =
NutchConfiguration.create();
NutchBean bean = new NutchBean(conf);
Query query = Query.parse(search_term,conf);
Hits hits = bean.search(query, NUM_HITS);
for (int i = 0; i < hits.getLength(); i++) {
Hit hit = hits.getHit(i);
HitDetails details = bean.getDetails(hit);
<h1>
<a
href="<%=details.getValue("url")%>"><%=details.getValue("title")%></a>
</h1>
<%=bean.getSummary(details, query).toString()%>
}
with this way the crawler fetches the pages
/page.jsp?language_id='en'&item_id='1111'
and
/page.jsp?item_id='1111'&language_id='en'
which are the same. How can i avoid it;
Thanks a lot.