Hi,
This might not be the right group to ask, but since I think it could
need some code mods, here goes.
I have a need to cluster or categorise the my search space into a
hierachy that is probably 3 levels deep, so that when the user searches
the system will specify a list of categories from which the search will
be returned.
I dont want the search to do a whole space search, retieving the first
say 100 hits, and then filter that down to the hits that match the
reqested categories as if list of categories is a sparsely populated
set, the search problem could get to N**2 quite qickly.
Fortunately in my case, the categories are defined by URLs and Meta tags
in the content eg
http://community.caret.cam.ac.uk/portal/site/~ian/*
goes into the
community.caret.cam.ac.uk:~ian:access
community.caret.cam.ac.uk:~ian:maintain
community.caret.cam.ac.uk is the host
~ian is the worksite
and access, maintain are roles within the worksite, derived from meta
tags in the content.
http://community.caret.cam.ac.uk/portal/site/12312312-112/*
goes into the
community.caret.cam.ac.uk:12312312-112:maintain category
Only the the host is predictable the remainder are dynamic, but could be
retrieved via an API or XML over HTTP interface.
Where should I start ?
Ian
-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers