Thank you for response.

I ran a simple test where I constructed a QueryParams object and have field / 
value of "url" and "http://blahblah.com/";
and then added this to a Query object and passed this to my beloved NutchBean 
to search for like this:
 String urlVal = "http://domain.com/webapp/content.do";;
      QueryParams qp = new QueryParams();
      qp.put("url", urlVal);
      Configuration conf = NutchConfiguration.create();
      NutchBean bean = new NutchBean(conf);
      Query query = new Query(conf);
      query.setParams(qp);
      Hits hits = bean.search(query);

Didn't get anything.


Is there someone who can give me a quick example of how this could be done?



________________________________________
From: CatOs Mandros [[email protected]]
Sent: Tuesday, August 24, 2010 4:10 AM
To: [email protected]
Subject: Re: find segment for an url

Hi Henry,

If i'm not mistaken, the correct way to handle this is to query your
index . It should have the information about what segment is the URL
located. Then you should only have to run your code on the segment
returned to get the content.


On Tue, Aug 24, 2010 at 12:24 AM, Henry Noerdlinger
<[email protected]> wrote:
> I want to loop through URLs which have been crawled / indexed.
>
> I have a (known) subset of URLs that I want to get the (raw) content for
>
> if I know the segment, I can do something like this:
>      String segName = "20100817162607";
>      String url = "http://adomain.com/awebappOfInterest/someContent.do";;
>
>      HitDetails detail = new HitDetails(segName, url);
>      Configuration conf = NutchConfiguration.create();
>
>      NutchBean bean = new NutchBean(conf);
>
>      byte[] contentBytes = bean.getContent(detail);
>      for (byte b : contentBytes)
>      {
>         System.out.print((char)b);
>      }
>
> My question is, given, a known Url, how can I find what segment it is in? Is 
> there something in the API for giving an URL and getting back the name of the 
> segment it is found in?
>
> regards,
> -henry
> [email protected]
>
> InfoNow Corporation  |  This communication, including attachments, is for the 
> exclusive use of addressee and may contain proprietary, confidential or 
> privileged information.
>


InfoNow Corporation  |  This communication, including attachments, is for the 
exclusive use of addressee and may contain proprietary, confidential or 
privileged information.

Reply via email to