If I were you I would use Luke ( http://code.google.com/p/luke/ ) to
examine what data do you have on your indexes if you're using lucene
indexes :)

On Tue, Aug 24, 2010 at 6:21 PM, Henry Noerdlinger
<[email protected]> wrote:
> Thank you for response.
>
> I ran a simple test where I constructed a QueryParams object and have field / 
> value of "url" and "http://blahblah.com/";
> and then added this to a Query object and passed this to my beloved NutchBean 
> to search for like this:
>  String urlVal = "http://domain.com/webapp/content.do";;
>      QueryParams qp = new QueryParams();
>      qp.put("url", urlVal);
>      Configuration conf = NutchConfiguration.create();
>      NutchBean bean = new NutchBean(conf);
>      Query query = new Query(conf);
>      query.setParams(qp);
>      Hits hits = bean.search(query);
>
> Didn't get anything.
>
>
> Is there someone who can give me a quick example of how this could be done?
>
>
>
> ________________________________________
> From: CatOs Mandros [[email protected]]
> Sent: Tuesday, August 24, 2010 4:10 AM
> To: [email protected]
> Subject: Re: find segment for an url
>
> Hi Henry,
>
> If i'm not mistaken, the correct way to handle this is to query your
> index . It should have the information about what segment is the URL
> located. Then you should only have to run your code on the segment
> returned to get the content.
>
>
> On Tue, Aug 24, 2010 at 12:24 AM, Henry Noerdlinger
> <[email protected]> wrote:
>> I want to loop through URLs which have been crawled / indexed.
>>
>> I have a (known) subset of URLs that I want to get the (raw) content for
>>
>> if I know the segment, I can do something like this:
>>      String segName = "20100817162607";
>>      String url = "http://adomain.com/awebappOfInterest/someContent.do";;
>>
>>      HitDetails detail = new HitDetails(segName, url);
>>      Configuration conf = NutchConfiguration.create();
>>
>>      NutchBean bean = new NutchBean(conf);
>>
>>      byte[] contentBytes = bean.getContent(detail);
>>      for (byte b : contentBytes)
>>      {
>>         System.out.print((char)b);
>>      }
>>
>> My question is, given, a known Url, how can I find what segment it is in? Is 
>> there something in the API for giving an URL and getting back the name of 
>> the segment it is found in?
>>
>> regards,
>> -henry
>> [email protected]
>>
>> InfoNow Corporation  |  This communication, including attachments, is for 
>> the exclusive use of addressee and may contain proprietary, confidential or 
>> privileged information.
>>
>
>
> InfoNow Corporation  |  This communication, including attachments, is for the 
> exclusive use of addressee and may contain proprietary, confidential or 
> privileged information.
>

Reply via email to