I want to loop through URLs which have been crawled / indexed.
I have a (known) subset of URLs that I want to get the (raw) content for
if I know the segment, I can do something like this:
String segName = "20100817162607";
String url = "http://adomain.com/awebappOfInterest/someContent.do";
HitDetails detail = new HitDetails(segName, url);
Configuration conf = NutchConfiguration.create();
NutchBean bean = new NutchBean(conf);
byte[] contentBytes = bean.getContent(detail);
for (byte b : contentBytes)
{
System.out.print((char)b);
}
My question is, given, a known Url, how can I find what segment it is in? Is
there something in the API for giving an URL and getting back the name of the
segment it is found in?
regards,
-henry
[email protected]
InfoNow Corporation | This communication, including attachments, is for the
exclusive use of addressee and may contain proprietary, confidential or
privileged information.