[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515812 ]
Andrzej Bialecki commented on NUTCH-439: ----------------------------------------- Some minor issues: * TLDScoringFilter contains a misspelled field, tldEnties, it should be renamed to tldEntries. Functionally it's of course the same, it's just a puzzling name that is easy to misspell (ie. spell correctly ;) ). * one of the use cases for the "tld" index field that you mention is that users may search on it. But in the latest patch this field is added with Field.Index.NO, which makes searching on it impossible. Also, in order to search on arbitrary Lucene fields Nutch needs a Query filter, so we would need a TLDQueryFilter, which doesn't exist (yet?). Other than that, +1 from me. Re: using domain names instead of host names - we need to discuss this further, let's create a separate issue on this. > Top Level Domains Indexing / Scoring > ------------------------------------ > > Key: NUTCH-439 > URL: https://issues.apache.org/jira/browse/NUTCH-439 > Project: Nutch > Issue Type: New Feature > Components: indexer > Affects Versions: 0.9.0 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Attachments: tld_plugin_v1.0.patch, tld_plugin_v1.1.patch, > tld_plugin_v2.0.patch, tld_plugin_v2.1.patch, tld_plugin_v2.2.patch > > > Top Level Domains (tlds) are the last part(s) of the host name in a DNS > system. TLDs are managed by the Internet Assigned Numbers Authority. IANA > divides tlds into three. infrastructure, generic(such as "com", "edu") and > country code tlds(such as "en", "de" , "tr", ). Indexing the top level domain > and optionally boosting is needed for improving the search results and > enhancing locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers