[Nutch-dev] Re: XML OUTPUT

Orlando Tempobono - AtlasVision Mon, 11 Apr 2005 10:17:15 -0700

Hi,

    We are working in a network of search websites here in Brazil called
www.sitedebusca.com the complete list are in
http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on
search.jsp to show the results in a
 simple XML format, to read in your own application actually write in PHP.
    We already are using nutch in a "beta" environment. We have plans to use
only nutch on a network of more than
50 regional search web sites.
    The code of search.jsp are in the next lines, i hope you can understand
my email and i hope this code are useful for
you.


Regards,
AtlasVision - Team

<%@ page
contentType="text/xml; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"

import="javax.servlet.*"
import="javax.servlet.http.*"
import="java.io.*"
import="java.util.*"
import="java.net.*"

import="net.nutch.html.Entities"
import="net.nutch.searcher.*"
%><%

NutchBean bean = NutchBean.get(application);

// set the character encoding to use when interpreting request values
request.setCharacterEncoding("ISO-8859-1");

bean.LOG.info("query request from " + request.getRemoteAddr());

// get query from request
String queryString = request.getParameter("query");
if (queryString == null) queryString = "";

// first hit to display
int start = 0;
String startString = request.getParameter("start");
if (startString != null) start = Integer.parseInt(startString);

// number of hits to display
int hitsPerPage = 10;
String hitsString = request.getParameter("hitsPerPage");
if (hitsString != null) hitsPerPage = Integer.parseInt(hitsString);

// max hits per site
int hitsPerSite = 2;
String hitsPerSiteString = request.getParameter("hitsPerSite");
if (hitsPerSiteString != null) hitsPerSite =
Integer.parseInt(hitsPerSiteString);

Query query = Query.parse(queryString);
bean.LOG.info("query: " + queryString);

// perform query
// Hits hits = bean.search(query, start + 1000, hitsPerSite); // FIXME esta
linha estava provocando erros na query: linux
Hits hits = bean.search(query, start + hitsPerPage, hitsPerSite);

// Last hit in the page
int end = start + hitsPerPage - 1;
if (end > hits.getLength() - 1) end = hits.getLength() - 1;

// Total length in the page
int length = 0;

if (start < end)
    length = end - start + 1;

bean.LOG.info("total hits: " + hits.getTotal());

%><?xml version="1.0" encoding="ISO-8859-1"?>
<%
  // To prevent the character encoding declared with 'contentType' page
  // directive from being overriden by JSTL (apache i18n), we freeze it
  // by flushing the output buffer.
  // see
http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/
  out.flush();
%>
<nutchSearch>
    <querystring><%=Entities.encode(queryString)%></querystring>

    <hitsInfo>
        <hitsPerPage><%=hitsPerPage%></hitsPerPage>
        <hitsPerSite><%=hitsPerSite%></hitsPerSite>
        <start><%=new Long(start)%></start>
        <end><%=new Long(end)%></end>
        <total><%=new Long(hits.getTotal())%></total>
        <totalIsExact><%=new Boolean(hits.totalIsExact())%></totalIsExact>
        <length><%=new Integer(hits.getLength())%></length>
        <lengthInPage><%=length%></lengthInPage>
    </hitsInfo>

<%
if (length > 0) {
%>
    <hitsData>
<%
    Hit[] show = hits.getHits(start, length);
    HitDetails[] details = bean.getDetails(show);
    String[] summaries = bean.getSummary(details, query);

    // display the hits
    for (int i = 0; i < length; i++) {

        Hit hit = show[i];
        HitDetails detail = details[i];
        String title = detail.getValue("title");
        String url = detail.getValue("url");
        String summary = summaries[i].replaceAll("([ \t\n\r]|&nbsp;){2,}", "
");
        String id = "idx=" + hit.getIndexNo() + "&id=" +
hit.getIndexDocNo();

        // use url for docs w/o title
        if (title == null || title.equals("")) title = url;
%>
        <hit>
            <title><![CDATA[<%=title%>]]></title>
            <summary><![CDATA[<%=summary%>]]></summary>
            <url><![CDATA[<%=url%>]]></url>
            <indexNo><%=hit.getIndexNo()%></indexNo>
            <docNo><%=hit.getIndexDocNo()%></docNo>
            <moreFromSite><%=(hit.moreFromSiteExcluded())%></moreFromSite>
            <site><![CDATA[<%=hit.getSite()%>]]></site>
        </hit>
<%
    }
%>
    </hitsData>
<%
}
%>

</nutchSearch>






----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Sunday, April 10, 2005 12:06 PM
Subject: XML OUTPUT


Hi!

 Does anybody knows how to output search results in XML format?
 I would like to provide my data like Google/Yahoo do with their API's.

Thanks!



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: XML OUTPUT

Reply via email to