[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830051#action_12830051 ] Dawid Weiss commented on NUTCH-673: --- Hi guys. I'd be willing to proceed with this and upgrade to Carrot2 3.x line. The first issue I have encountered is Lucene incompatibilities between 2.9 (currently in Nutch) and 3.0 (currently in Carrot2). Any plans or reasons not to upgrade to Lucene 3.0? It's been with us for quite a while. If there are no objections, I can prepare a patch replacing Lucene 2.9 with Lucene 3.0 (as a separate issue). Upgrade the Carrot2 plug-in to release 3.0 -- Key: NUTCH-673 URL: https://issues.apache.org/jira/browse/NUTCH-673 Project: Nutch Issue Type: Improvement Components: web gui Affects Versions: 0.9.0 Environment: All Nutch deployments. Reporter: Sean Dean Priority: Minor Fix For: 1.1 Release 3.0 of the Carrot2 plug-in was released recently. We currently have version 2.1 in the source tree and upgrading it to the latest version before 1.0-release might make sence. Details on the release can be found here: http://project.carrot2.org/release-3.0-notes.html One major change in requirements is for JDK 1.5 to be used, but this is also now required for Hadoop 0.19 so this wouldnt be the only reason for the switch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830053#action_12830053 ] Sami Siren commented on NUTCH-673: -- {quote} Any plans or reasons not to upgrade to Lucene 3.0? {quote} I see no reason to stick with 2.9 {quote} I can prepare a patch replacing Lucene 2.9 with Lucene 3.0 (as a separate issue). {quote} +1 Upgrade the Carrot2 plug-in to release 3.0 -- Key: NUTCH-673 URL: https://issues.apache.org/jira/browse/NUTCH-673 Project: Nutch Issue Type: Improvement Components: web gui Affects Versions: 0.9.0 Environment: All Nutch deployments. Reporter: Sean Dean Priority: Minor Fix For: 1.1 Release 3.0 of the Carrot2 plug-in was released recently. We currently have version 2.1 in the source tree and upgrading it to the latest version before 1.0-release might make sence. Details on the release can be found here: http://project.carrot2.org/release-3.0-notes.html One major change in requirements is for JDK 1.5 to be used, but this is also now required for Hadoop 0.19 so this wouldnt be the only reason for the switch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830065#action_12830065 ] Andrzej Bialecki commented on NUTCH-673: - +1 on both counts. Upgrade to Lucene 3.0 may involve more work than expected because of deprecated 2.x APIs that are no longer available in 3.0. Upgrade the Carrot2 plug-in to release 3.0 -- Key: NUTCH-673 URL: https://issues.apache.org/jira/browse/NUTCH-673 Project: Nutch Issue Type: Improvement Components: web gui Affects Versions: 0.9.0 Environment: All Nutch deployments. Reporter: Sean Dean Priority: Minor Fix For: 1.1 Release 3.0 of the Carrot2 plug-in was released recently. We currently have version 2.1 in the source tree and upgrading it to the latest version before 1.0-release might make sence. Details on the release can be found here: http://project.carrot2.org/release-3.0-notes.html One major change in requirements is for JDK 1.5 to be used, but this is also now required for Hadoop 0.19 so this wouldnt be the only reason for the switch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (NUTCH-786) Better list of suffix domains
Better list of suffix domains - Key: NUTCH-786 URL: https://issues.apache.org/jira/browse/NUTCH-786 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 Small improvement to the content of domain-suffixes.xml : added compound TLD for .ar, .co, .id, .il, .mx, .nz and .za -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-786) Better list of suffix domains
[ https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-786: Attachment: NUTCH-786.patch Small improvement to the content of domain-suffixes.xml : added compound TLD for .ar, .co, .id, .il, .mx, .nz and .za Better list of suffix domains - Key: NUTCH-786 URL: https://issues.apache.org/jira/browse/NUTCH-786 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 Attachments: NUTCH-786.patch Small improvement to the content of domain-suffixes.xml : added compound TLD for .ar, .co, .id, .il, .mx, .nz and .za -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (NUTCH-786) Better list of suffix domains
[ https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-786. --- Resolution: Fixed Committed revision 906907 Better list of suffix domains - Key: NUTCH-786 URL: https://issues.apache.org/jira/browse/NUTCH-786 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 Attachments: NUTCH-786.patch Small improvement to the content of domain-suffixes.xml : added compound TLD for .ar, .co, .id, .il, .mx, .nz and .za -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (NUTCH-787) Upgrade Lucene to 3.0.0.
Upgrade Lucene to 3.0.0. Key: NUTCH-787 URL: https://issues.apache.org/jira/browse/NUTCH-787 Project: Nutch Issue Type: Task Components: build Reporter: Dawid Weiss Priority: Trivial -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830078#action_12830078 ] Dawid Weiss commented on NUTCH-673: --- O.K., I'll see into the complexity of upgrading to 3.0 first then. Filing a separate issue. Upgrade the Carrot2 plug-in to release 3.0 -- Key: NUTCH-673 URL: https://issues.apache.org/jira/browse/NUTCH-673 Project: Nutch Issue Type: Improvement Components: web gui Affects Versions: 0.9.0 Environment: All Nutch deployments. Reporter: Sean Dean Priority: Minor Fix For: 1.1 Release 3.0 of the Carrot2 plug-in was released recently. We currently have version 2.1 in the source tree and upgrading it to the latest version before 1.0-release might make sence. Details on the release can be found here: http://project.carrot2.org/release-3.0-notes.html One major change in requirements is for JDK 1.5 to be used, but this is also now required for Hadoop 0.19 so this wouldnt be the only reason for the switch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.
[ https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830085#action_12830085 ] Dawid Weiss commented on NUTCH-787: --- Just did an initial check -- this should be doable, although will result in a sizeable patch due to API changes and removed deprecations. I think it still makes sense to try and push the 3.0 version of Lucene into Nutch, so I will keep working on this and seek help in reviewing the patch (and incompatible changes) once it's ready. Upgrade Lucene to 3.0.0. Key: NUTCH-787 URL: https://issues.apache.org/jira/browse/NUTCH-787 Project: Nutch Issue Type: Task Components: build Reporter: Dawid Weiss Priority: Trivial -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-786) Better list of suffix domains
[ https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830109#action_12830109 ] Ken Krugler commented on NUTCH-786: --- Is this something that should also be applied to crawler-commons? I believe Ian had added support for finding Effective TLDs and that this support included an effective_tld_names.dat file. Better list of suffix domains - Key: NUTCH-786 URL: https://issues.apache.org/jira/browse/NUTCH-786 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 1.1 Attachments: NUTCH-786.patch Small improvement to the content of domain-suffixes.xml : added compound TLD for .ar, .co, .id, .il, .mx, .nz and .za -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.