Hi,
We are working in a network of search websites here in Brazil called
www.sitedebusca.com the complete list are in
http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on
search.jsp to show the results in a
simple XML format, to read in your own application actually write in PHP.
We already are using nutch in a "beta" environment. We have plans to use
only nutch on a network of more than
50 regional search web sites.
The code of search.jsp are in the next lines, i hope you can understand
my email and i hope this code are useful for
you.
Regards,
AtlasVision - Team
<%@ page
contentType="text/xml; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"
import="javax.servlet.*"
import="javax.servlet.http.*"
import="java.io.*"
import="java.util.*"
import="java.net.*"
import="net.nutch.html.Entities"
import="net.nutch.searcher.*"
%><%
NutchBean bean = NutchBean.get(application);
// set the character encoding to use when interpreting request values
request.setCharacterEncoding("ISO-8859-1");
bean.LOG.info("query request from " + request.getRemoteAddr());
// get query from request
String queryString = request.getParameter("query");
if (queryString == null) queryString = "";
// first hit to display
int start = 0;
String startString = request.getParameter("start");
if (startString != null) start = Integer.parseInt(startString);
// number of hits to display
int hitsPerPage = 10;
String hitsString = request.getParameter("hitsPerPage");
if (hitsString != null) hitsPerPage = Integer.parseInt(hitsString);
// max hits per site
int hitsPerSite = 2;
String hitsPerSiteString = request.getParameter("hitsPerSite");
if (hitsPerSiteString != null) hitsPerSite =
Integer.parseInt(hitsPerSiteString);
Query query = Query.parse(queryString);
bean.LOG.info("query: " + queryString);
// perform query
// Hits hits = bean.search(query, start + 1000, hitsPerSite); // FIXME esta
linha estava provocando erros na query: linux
Hits hits = bean.search(query, start + hitsPerPage, hitsPerSite);
// Last hit in the page
int end = start + hitsPerPage - 1;
if (end > hits.getLength() - 1) end = hits.getLength() - 1;
// Total length in the page
int length = 0;
if (start < end)
length = end - start + 1;
bean.LOG.info("total hits: " + hits.getTotal());
%><?xml version="1.0" encoding="ISO-8859-1"?>
<%
// To prevent the character encoding declared with 'contentType' page
// directive from being overriden by JSTL (apache i18n), we freeze it
// by flushing the output buffer.
// see
http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/
out.flush();
%>
<nutchSearch>
<querystring><%=Entities.encode(queryString)%></querystring>
<hitsInfo>
<hitsPerPage><%=hitsPerPage%></hitsPerPage>
<hitsPerSite><%=hitsPerSite%></hitsPerSite>
<start><%=new Long(start)%></start>
<end><%=new Long(end)%></end>
<total><%=new Long(hits.getTotal())%></total>
<totalIsExact><%=new Boolean(hits.totalIsExact())%></totalIsExact>
<length><%=new Integer(hits.getLength())%></length>
<lengthInPage><%=length%></lengthInPage>
</hitsInfo>
<%
if (length > 0) {
%>
<hitsData>
<%
Hit[] show = hits.getHits(start, length);
HitDetails[] details = bean.getDetails(show);
String[] summaries = bean.getSummary(details, query);
// display the hits
for (int i = 0; i < length; i++) {
Hit hit = show[i];
HitDetails detail = details[i];
String title = detail.getValue("title");
String url = detail.getValue("url");
String summary = summaries[i].replaceAll("([ \t\n\r]| ){2,}", "
");
String id = "idx=" + hit.getIndexNo() + "&id=" +
hit.getIndexDocNo();
// use url for docs w/o title
if (title == null || title.equals("")) title = url;
%>
<hit>
<title><![CDATA[<%=title%>]]></title>
<summary><![CDATA[<%=summary%>]]></summary>
<url><![CDATA[<%=url%>]]></url>
<indexNo><%=hit.getIndexNo()%></indexNo>
<docNo><%=hit.getIndexDocNo()%></docNo>
<moreFromSite><%=(hit.moreFromSiteExcluded())%></moreFromSite>
<site><![CDATA[<%=hit.getSite()%>]]></site>
</hit>
<%
}
%>
</hitsData>
<%
}
%>
</nutchSearch>
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Sunday, April 10, 2005 12:06 PM
Subject: XML OUTPUT
Hi!
Does anybody knows how to output search results in XML format?
I would like to provide my data like Google/Yahoo do with their API's.
Thanks!
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers