[
https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1686:
Attachment: NUTCH-1686.patch
Optimize UpdateDb to load less field from Store
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1687:
Attachment: NUTCH-1687.patch
Pick queue in Round Robin
-
Hi there,
I was following the RunNutchInEclipse tutorial (1.7 Nutch / trunk example).
After I configured the java run configurations as the tutorial showed.. and
clicked run. It did not show the injector process as shown in the
tutorial, and instead, it showed error:
Usage: Injector
See https://builds.apache.org/job/Nutch-trunk/2459/
--
[...truncated 197 lines...]
at
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:382)
... 32 more
Caused by: svn: E175002: OPTIONS request failed on
Nguyen Manh Tien created NUTCH-1688:
---
Summary: Port DeleteDuplicate based on crawlDB only to 2.x
Key: NUTCH-1688
URL: https://issues.apache.org/jira/browse/NUTCH-1688
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1688:
Component/s: indexer
Port DeleteDuplicate based on crawlDB only to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1688:
Attachment: NUTCH-1688.patch
Port DeleteDuplicate based on crawlDB only to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855369#comment-13855369
]
Otis Gospodnetic commented on NUTCH-1687:
-
[~tiennm] - the new class should have
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: NUTCH-1689.patch
Improve CrawlDb stats
-
Nguyen Manh Tien created NUTCH-1689:
---
Summary: Improve CrawlDb stats
Key: NUTCH-1689
URL: https://issues.apache.org/jira/browse/NUTCH-1689
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Fix Version/s: 2.3
Improve CrawlDb stats
-
Key:
Nguyen Manh Tien created NUTCH-1690:
---
Summary: IndexClean: mark url as unindexed after clean to not
delete again
Key: NUTCH-1690
URL: https://issues.apache.org/jira/browse/NUTCH-1690
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1690:
Fix Version/s: 2.3
IndexClean: mark url as unindexed after clean to not delete again
[
https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855385#comment-13855385
]
Nguyen Manh Tien commented on NUTCH-1686:
-
no backwards compatibility, because i
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855386#comment-13855386
]
Nguyen Manh Tien commented on NUTCH-1687:
-
I found one in double linked list
You are asking the right question at the right place.
The example shown in the tutorial was for Nutch 2.X series. The 1.X
Injector needs an extra param as input which is the location of the crawldb
to inject the urls into. (For first time, it would create a new one on the
location in the command).
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: (was: NUTCH-1690.patch)
Improve CrawlDb stats
-
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: NUTCH-1690.patch
Thanks Tejas for reviewing
1)2) I think my change don't
18 matches
Mail list logo