[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enis Soztutar updated NUTCH-439: -------------------------------- Attachment: domain.suffixes_v2.1.patch > Very nice patch! Thanks ! > IP_PATTERN - it could be tighter, instead of \\d+ it could use \\d{1,3} now it is (\\d{1,3}\\.){3}(\\d{1,3}) >the DomainStatistics tool: I'd rather see it as a separate JIRA issue. The >reason is that it's a common request for enhancement, but specific >requirements vary wildly. Some users prefer to build a separate DB that holds >staistical info and can be used in various steps of the work cycle, others >still prefer one-time tools such as this one. DomainStatistics is really a quick hack i've written for demonstration of the new patch. I've moved it from the latest patch. Once the user requirements are settled, we can move on from there. Also you may not want to commit MozillaPublicSuffixListParser.java, but it is good we have it somewhere public. > Top Level Domains Indexing / Scoring > ------------------------------------ > > Key: NUTCH-439 > URL: https://issues.apache.org/jira/browse/NUTCH-439 > Project: Nutch > Issue Type: New Feature > Components: indexer > Affects Versions: 0.9.0 > Reporter: Enis Soztutar > Attachments: domain.suffixes_v2.1.patch, tld_plugin_v1.0.patch, > tld_plugin_v1.1.patch, tld_plugin_v2.0.patch > > > Top Level Domains (tlds) are the last part(s) of the host name in a DNS > system. TLDs are managed by the Internet Assigned Numbers Authority. IANA > divides tlds into three. infrastructure, generic(such as "com", "edu") and > country code tlds(such as "en", "de" , "tr", ). Indexing the top level domain > and optionally boosting is needed for improving the search results and > enhancing locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers