Hi Henry,

If i'm not mistaken, the correct way to handle this is to query your
index . It should have the information about what segment is the URL
located. Then you should only have to run your code on the segment
returned to get the content.


On Tue, Aug 24, 2010 at 12:24 AM, Henry Noerdlinger
<[email protected]> wrote:
> I want to loop through URLs which have been crawled / indexed.
>
> I have a (known) subset of URLs that I want to get the (raw) content for
>
> if I know the segment, I can do something like this:
>      String segName = "20100817162607";
>      String url = "http://adomain.com/awebappOfInterest/someContent.do";;
>
>      HitDetails detail = new HitDetails(segName, url);
>      Configuration conf = NutchConfiguration.create();
>
>      NutchBean bean = new NutchBean(conf);
>
>      byte[] contentBytes = bean.getContent(detail);
>      for (byte b : contentBytes)
>      {
>         System.out.print((char)b);
>      }
>
> My question is, given, a known Url, how can I find what segment it is in? Is 
> there something in the API for giving an URL and getting back the name of the 
> segment it is found in?
>
> regards,
> -henry
> [email protected]
>
> InfoNow Corporation  |  This communication, including attachments, is for the 
> exclusive use of addressee and may contain proprietary, confidential or 
> privileged information.
>

Reply via email to