Hi,
This might not be the right group to ask, but since I think it could need some code mods, here goes.

I have a need to cluster or categorise the my search space into a hierachy that is probably 3 levels deep, so that when the user searches the system will specify a list of categories from which the search will be returned.

I dont want the search to do a whole space search, retieving the first say 100 hits, and then filter that down to the hits that match the reqested categories as if list of categories is a sparsely populated set, the search problem could get to N**2 quite qickly.

Fortunately in my case, the categories are defined by URLs and Meta tags in the content eg

http://community.caret.cam.ac.uk/portal/site/~ian/*
goes into the
community.caret.cam.ac.uk:~ian:access
community.caret.cam.ac.uk:~ian:maintain

community.caret.cam.ac.uk is the host
~ian is the worksite
and access, maintain are roles within the worksite, derived from meta tags in the content.

http://community.caret.cam.ac.uk/portal/site/12312312-112/*
goes into the
community.caret.cam.ac.uk:12312312-112:maintain category


Only the the host is predictable the remainder are dynamic, but could be retrieved via an API or XML over HTTP interface.

Where should I start ?

Ian


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to