[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830051#action_12830051
 ] 

Dawid Weiss commented on NUTCH-673:
---

Hi guys. I'd be willing to proceed with this and upgrade to Carrot2 3.x line. 
The first issue I have encountered is Lucene incompatibilities between 2.9 
(currently in Nutch) and 3.0 (currently in Carrot2). Any plans or reasons not 
to upgrade to Lucene 3.0? It's been with us for quite a while. If there are no 
objections, I can prepare a patch replacing Lucene 2.9 with Lucene 3.0 (as a 
separate issue).

 Upgrade the Carrot2 plug-in to release 3.0
 --

 Key: NUTCH-673
 URL: https://issues.apache.org/jira/browse/NUTCH-673
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: 0.9.0
 Environment: All Nutch deployments.
Reporter: Sean Dean
Priority: Minor
 Fix For: 1.1


 Release 3.0 of the Carrot2 plug-in was released recently.
 We currently have version 2.1 in the source tree and upgrading it to the 
 latest version before 1.0-release might make sence.
 Details on the release can be found here: 
 http://project.carrot2.org/release-3.0-notes.html
 One major change in requirements is for JDK 1.5 to be used, but this is also 
 now required for Hadoop 0.19 so this wouldnt be the only reason for the 
 switch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830053#action_12830053
 ] 

Sami Siren commented on NUTCH-673:
--

{quote}
Any plans or reasons not to upgrade to Lucene 3.0?
{quote}

I see no reason to stick with 2.9

{quote}
I can prepare a patch replacing Lucene 2.9 with Lucene 3.0 (as a separate 
issue).
{quote}

+1

 Upgrade the Carrot2 plug-in to release 3.0
 --

 Key: NUTCH-673
 URL: https://issues.apache.org/jira/browse/NUTCH-673
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: 0.9.0
 Environment: All Nutch deployments.
Reporter: Sean Dean
Priority: Minor
 Fix For: 1.1


 Release 3.0 of the Carrot2 plug-in was released recently.
 We currently have version 2.1 in the source tree and upgrading it to the 
 latest version before 1.0-release might make sence.
 Details on the release can be found here: 
 http://project.carrot2.org/release-3.0-notes.html
 One major change in requirements is for JDK 1.5 to be used, but this is also 
 now required for Hadoop 0.19 so this wouldnt be the only reason for the 
 switch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830065#action_12830065
 ] 

Andrzej Bialecki  commented on NUTCH-673:
-

+1 on both counts. Upgrade to Lucene 3.0 may involve more work than expected 
because of deprecated 2.x APIs that are no longer available in 3.0.

 Upgrade the Carrot2 plug-in to release 3.0
 --

 Key: NUTCH-673
 URL: https://issues.apache.org/jira/browse/NUTCH-673
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: 0.9.0
 Environment: All Nutch deployments.
Reporter: Sean Dean
Priority: Minor
 Fix For: 1.1


 Release 3.0 of the Carrot2 plug-in was released recently.
 We currently have version 2.1 in the source tree and upgrading it to the 
 latest version before 1.0-release might make sence.
 Details on the release can be found here: 
 http://project.carrot2.org/release-3.0-notes.html
 One major change in requirements is for JDK 1.5 to be used, but this is also 
 now required for Hadoop 0.19 so this wouldnt be the only reason for the 
 switch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-786) Better list of suffix domains

2010-02-05 Thread Julien Nioche (JIRA)
Better list of suffix domains
-

 Key: NUTCH-786
 URL: https://issues.apache.org/jira/browse/NUTCH-786
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1


Small improvement to the content of domain-suffixes.xml : added compound TLD 
for .ar, .co, .id, .il, .mx, .nz and .za

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-786) Better list of suffix domains

2010-02-05 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-786:


Attachment: NUTCH-786.patch

Small improvement to the content of domain-suffixes.xml : added compound TLD 
for .ar, .co, .id, .il, .mx, .nz and .za

 Better list of suffix domains
 -

 Key: NUTCH-786
 URL: https://issues.apache.org/jira/browse/NUTCH-786
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1

 Attachments: NUTCH-786.patch


 Small improvement to the content of domain-suffixes.xml : added compound TLD 
 for .ar, .co, .id, .il, .mx, .nz and .za

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (NUTCH-786) Better list of suffix domains

2010-02-05 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche closed NUTCH-786.
---

Resolution: Fixed

Committed revision 906907

 Better list of suffix domains
 -

 Key: NUTCH-786
 URL: https://issues.apache.org/jira/browse/NUTCH-786
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1

 Attachments: NUTCH-786.patch


 Small improvement to the content of domain-suffixes.xml : added compound TLD 
 for .ar, .co, .id, .il, .mx, .nz and .za

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-787) Upgrade Lucene to 3.0.0.

2010-02-05 Thread Dawid Weiss (JIRA)
Upgrade Lucene to 3.0.0.


 Key: NUTCH-787
 URL: https://issues.apache.org/jira/browse/NUTCH-787
 Project: Nutch
  Issue Type: Task
  Components: build
Reporter: Dawid Weiss
Priority: Trivial




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2010-02-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830078#action_12830078
 ] 

Dawid Weiss commented on NUTCH-673:
---

O.K., I'll see into the complexity of upgrading to 3.0 first then. Filing a 
separate issue.

 Upgrade the Carrot2 plug-in to release 3.0
 --

 Key: NUTCH-673
 URL: https://issues.apache.org/jira/browse/NUTCH-673
 Project: Nutch
  Issue Type: Improvement
  Components: web gui
Affects Versions: 0.9.0
 Environment: All Nutch deployments.
Reporter: Sean Dean
Priority: Minor
 Fix For: 1.1


 Release 3.0 of the Carrot2 plug-in was released recently.
 We currently have version 2.1 in the source tree and upgrading it to the 
 latest version before 1.0-release might make sence.
 Details on the release can be found here: 
 http://project.carrot2.org/release-3.0-notes.html
 One major change in requirements is for JDK 1.5 to be used, but this is also 
 now required for Hadoop 0.19 so this wouldnt be the only reason for the 
 switch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-787) Upgrade Lucene to 3.0.0.

2010-02-05 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830085#action_12830085
 ] 

Dawid Weiss commented on NUTCH-787:
---

Just did an initial check -- this should be doable, although will result in a 
sizeable patch due to API changes and removed deprecations. I think it still 
makes sense to try and push the 3.0 version of Lucene into Nutch, so I will 
keep working on this and seek help in reviewing the patch (and incompatible 
changes) once it's ready.

 Upgrade Lucene to 3.0.0.
 

 Key: NUTCH-787
 URL: https://issues.apache.org/jira/browse/NUTCH-787
 Project: Nutch
  Issue Type: Task
  Components: build
Reporter: Dawid Weiss
Priority: Trivial



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-786) Better list of suffix domains

2010-02-05 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830109#action_12830109
 ] 

Ken Krugler commented on NUTCH-786:
---

Is this something that should also be applied to crawler-commons? I believe Ian 
had added support for finding Effective TLDs and that this support included 
an effective_tld_names.dat file.


 Better list of suffix domains
 -

 Key: NUTCH-786
 URL: https://issues.apache.org/jira/browse/NUTCH-786
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Julien Nioche
Assignee: Julien Nioche
 Fix For: 1.1

 Attachments: NUTCH-786.patch


 Small improvement to the content of domain-suffixes.xml : added compound TLD 
 for .ar, .co, .id, .il, .mx, .nz and .za

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.