How would one go about this ----->>>>> "During indexing, your
indexing filter could add a field named "sitecluster""
Could I create a field called "region" and apply to sites based on
there location . If so how. Can this be tweaked in config file
-Bud
On Nov 30, 2005, at 12:29 PM, Andy Lee wrote:
On Nov 30, 2005, at 1:20 AM, Matt Kangas wrote:
- if you only want to match one site at a time, you can just add
"site:xxx" to the query. the "site" field exists in the index by
default
Note that the index-basic indexing filter does not tokenize the
"site" field, so if you do "site:salami.com" you will only match
URLs whose host component exactly matches the value you give --
http://salami.com/etc and ftp://salami.com/etc but NOT http://
www.salami.com. This may or may not be what you want.
- if you want assign ids to clusters of sites, you can do the site-
>id lookup at index time and add a custom field to the index
This is one way to address the above issue. During indexing, your
indexing filter could add a field named "sitecluster" (or
whatever), and for all the above URLs (and anything else you want
to cluster with them) you would set "salami.com" as the value of
that field. Then your search would be "sitecluster:salami.com".
Another approach would be to search not on the "site" field but the
"url" field, which *is* tokenized at indexing time. So
"url:salami" would find all the salami URLs above, as well as
http://www2.salami.com and http://www.salami.org and http://
salami.lunch.com -- which again may or may not be what you want.
--Andy
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general