[jira] [Updated] (NUTCH-1686) Optimize UpdateDb to load less field from Store

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1686: Attachment: NUTCH-1686.patch Optimize UpdateDb to load less field from Store

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1687: Attachment: NUTCH-1687.patch Pick queue in Round Robin -

Step Through Nutch 1.7 Inside Eclipse Missing Argument

2013-12-22 Thread Bin Wang
Hi there, I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk example). After I configured the java run configurations as the tutorial showed.. and clicked run. It did not show the injector process as shown in the tutorial, and instead, it showed error: Usage: Injector

Build failed in Jenkins: Nutch-trunk #2459

2013-12-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2459/ -- [...truncated 197 lines...] at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382) ... 32 more Caused by: svn: E175002: OPTIONS request failed on

[jira] [Created] (NUTCH-1688) Port DeleteDuplicate based on crawlDB only to 2.x

2013-12-22 Thread Nguyen Manh Tien (JIRA)
Nguyen Manh Tien created NUTCH-1688: --- Summary: Port DeleteDuplicate based on crawlDB only to 2.x Key: NUTCH-1688 URL: https://issues.apache.org/jira/browse/NUTCH-1688 Project: Nutch Issue

[jira] [Updated] (NUTCH-1688) Port DeleteDuplicate based on crawlDB only to 2.x

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1688: Component/s: indexer Port DeleteDuplicate based on crawlDB only to 2.x

[jira] [Updated] (NUTCH-1688) Port DeleteDuplicate based on crawlDB only to 2.x

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1688: Attachment: NUTCH-1688.patch Port DeleteDuplicate based on crawlDB only to 2.x

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-22 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855369#comment-13855369 ] Otis Gospodnetic commented on NUTCH-1687: - [~tiennm] - the new class should have

[jira] [Updated] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1689: Attachment: NUTCH-1689.patch Improve CrawlDb stats -

[jira] [Created] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Nguyen Manh Tien (JIRA)
Nguyen Manh Tien created NUTCH-1689: --- Summary: Improve CrawlDb stats Key: NUTCH-1689 URL: https://issues.apache.org/jira/browse/NUTCH-1689 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1689: Fix Version/s: 2.3 Improve CrawlDb stats - Key:

[jira] [Created] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again

2013-12-22 Thread Nguyen Manh Tien (JIRA)
Nguyen Manh Tien created NUTCH-1690: --- Summary: IndexClean: mark url as unindexed after clean to not delete again Key: NUTCH-1690 URL: https://issues.apache.org/jira/browse/NUTCH-1690 Project: Nutch

[jira] [Updated] (NUTCH-1690) IndexClean: mark url as unindexed after clean to not delete again

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1690: Fix Version/s: 2.3 IndexClean: mark url as unindexed after clean to not delete again

[jira] [Commented] (NUTCH-1686) Optimize UpdateDb to load less field from Store

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855385#comment-13855385 ] Nguyen Manh Tien commented on NUTCH-1686: - no backwards compatibility, because i

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855386#comment-13855386 ] Nguyen Manh Tien commented on NUTCH-1687: - I found one in double linked list

Re: Step Through Nutch 1.7 Inside Eclipse Missing Argument

2013-12-22 Thread Tejas Patil
You are asking the right question at the right place. The example shown in the tutorial was for Nutch 2.X series. The 1.X Injector needs an extra param as input which is the location of the crawldb to inject the urls into. (For first time, it would create a new one on the location in the command).

[jira] [Updated] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1689: Attachment: (was: NUTCH-1690.patch) Improve CrawlDb stats -

[jira] [Updated] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Nguyen Manh Tien (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Manh Tien updated NUTCH-1689: Attachment: NUTCH-1690.patch Thanks Tejas for reviewing 1)2) I think my change don't