So,

(nutch 0.7.2)

Does anyone know if there is such a query in nutch that I could
somehow return a full list of all unique domains that have been
crawled?  I was originally storing each domain's segment separately,
but that ended up being a nightmare when it came to creating search
beans, since the bean opens up each segment on init. So, I am working
on an incremental segment merge tool to handle the thousands of
segments I have and get em down to a few.

Also... What I really need is a pointer at how to do the following:

I have several custom attributes/fields, say "business" and
"confidential", " added to a document when it was indexed.  I want to
assign a boost value to the custom fields and have nutch use those
values when it is searching.  Where might I look to find such a thing?
 I do not want to search by those fields, I only want them as part of
nutch's scoring so that if  there are high boost values for those
fields, they will be pushed to the top.

Thanks again!

Briggs




-- 
"Concious decisions by concious minds are what make reality real"

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to