[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enis Soztutar updated NUTCH-439: -------------------------------- Attachment: tld_plugin_v2.3.patch bq. TLDScoringFilter contains a misspelled field, tldEnties, it should be renamed to tldEntries Done! bq. one of the use cases for the "tld" index field that you mention is that users may search on it. But in the latest patch this field is added with Field.Index.NO, which makes searching on it impossible. Also, in order to search on arbitrary Lucene fields Nutch needs a Query filter, so we would need a TLDQueryFilter, which doesn't exist (yet?). Well, infact NUTCH-445 covers searching on tlds, namely we would be able to search site:lucene.apache.org, or site:apache.org or even site:org, therefore i think indexing tld fields and TLDQueryFilter is not needed. I will delve deeper into NUTCH-445 as soon as i find some time. We can move domain indexing functionality to index-basic so that it will be generic enough. bq. using domain names instead of host names - we need to discuss this further, let's create a separate issue on this. we can open issues case by case since the patches is expected to have major side effects. > Top Level Domains Indexing / Scoring > ------------------------------------ > > Key: NUTCH-439 > URL: https://issues.apache.org/jira/browse/NUTCH-439 > Project: Nutch > Issue Type: New Feature > Components: indexer > Affects Versions: 0.9.0 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Attachments: tld_plugin_v1.0.patch, tld_plugin_v1.1.patch, > tld_plugin_v2.0.patch, tld_plugin_v2.1.patch, tld_plugin_v2.2.patch, > tld_plugin_v2.3.patch > > > Top Level Domains (tlds) are the last part(s) of the host name in a DNS > system. TLDs are managed by the Internet Assigned Numbers Authority. IANA > divides tlds into three. infrastructure, generic(such as "com", "edu") and > country code tlds(such as "en", "de" , "tr", ). Indexing the top level domain > and optionally boosting is needed for improving the search results and > enhancing locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers