document deduplication (exact duplicates) failed using MD5Signature
---
Key: NUTCH-835
URL: https://issues.apache.org/jira/browse/NUTCH-835
Project: Nutch
Issue Type: Bug
Af
HttpClient null pointer exception
-
Key: NUTCH-862
URL: https://issues.apache.org/jira/browse/NUTCH-862
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.0.0
Environ
[
https://issues.apache.org/jira/browse/NUTCH-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-862:
--
Attachment: NUTCH-862.patch
patch
> HttpClient null pointer exception
> ---
[
https://issues.apache.org/jira/browse/NUTCH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930588#action_12930588
]
Sebastian Nagel commented on NUTCH-933:
---
The modifiedTime stored in a CrawlDatum recor
max. redirects not handled correctly: fetcher stops at max-1 redirects
--
Key: NUTCH-962
URL: https://issues.apache.org/jira/browse/NUTCH-962
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-962:
--
Attachment: Fetcher_redir.patch
patch for 1.3 to respect count of redirects literally:
http.red
Sebastian Nagel created NUTCH-1344:
--
Summary: BasicURLNormalizer to normalize https same as http
Key: NUTCH-1344
URL: https://issues.apache.org/jira/browse/NUTCH-1344
Project: Nutch
Issue T
[
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1344:
---
Attachment: NUTCH-1344.patch
> BasicURLNormalizer to normalize https same as http
>
[
https://issues.apache.org/jira/browse/NUTCH-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258827#comment-13258827
]
Sebastian Nagel commented on NUTCH-1339:
BasicURLNormalizer does not remove the an
[
https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263124#comment-13263124
]
Sebastian Nagel commented on NUTCH-1293:
The content type should be added to metad
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273954#comment-13273954
]
Sebastian Nagel commented on NUTCH-1323:
After a small test crawl on http://si.dra
Sebastian Nagel created NUTCH-1383:
--
Summary: IndexingFiltersChecker to show error message instead of
null pointer exception
Key: NUTCH-1383
URL: https://issues.apache.org/jira/browse/NUTCH-1383
Proj
[
https://issues.apache.org/jira/browse/NUTCH-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1383:
---
Attachment: NUTCH-1383.patch
patch for both null pointer exceptions
> Indexi
Sebastian Nagel created NUTCH-1389:
--
Summary: parsechecker and indexchecker to report truncated content
Key: NUTCH-1389
URL: https://issues.apache.org/jira/browse/NUTCH-1389
Project: Nutch
I
Sebastian Nagel created NUTCH-1415:
--
Summary: release packages to contain top level folder
apache-nutch-x.x
Key: NUTCH-1415
URL: https://issues.apache.org/jira/browse/NUTCH-1415
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1415:
---
Attachment: NUTCH-1415.patch
Fix ant targets tar-src, tar-bin, zip-src, zip-bin
Also set appr
[
https://issues.apache.org/jira/browse/NUTCH-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1415:
---
Attachment: NUTCH-1415-2.patch
Hi Lewis, you are completely right:
the tarfileset / zipfilese
Sebastian Nagel created NUTCH-1419:
--
Summary: parsechecker and indexchecker to report protocol status
Key: NUTCH-1419
URL: https://issues.apache.org/jira/browse/NUTCH-1419
Project: Nutch
Iss
[
https://issues.apache.org/jira/browse/NUTCH-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1419:
---
Attachment: NUTCH-1419-1.patch
Simple patch: in case of a protocol status other than 200 (suc
Sebastian Nagel created NUTCH-1421:
--
Summary: RegexURLNormalizer to only skip rules with invalid
patterns
Key: NUTCH-1421
URL: https://issues.apache.org/jira/browse/NUTCH-1421
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1421:
---
Attachment: NUTCH-1421-1.patch
> RegexURLNormalizer to only skip rules with invalid patte
Sebastian Nagel created NUTCH-1422:
--
Summary: reset signature for redirects
Key: NUTCH-1422
URL: https://issues.apache.org/jira/browse/NUTCH-1422
Project: Nutch
Issue Type: Bug
Com
[
https://issues.apache.org/jira/browse/NUTCH-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1422:
---
Attachment: NUTCH-1422_redir_notmodified_log.txt
> reset signature for redirects
> --
[
https://issues.apache.org/jira/browse/NUTCH-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410905#comment-13410905
]
Sebastian Nagel commented on NUTCH-1328:
Duplicate of NUTCH-706
>
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Attachment: NUTCH-706.patch
- fix the pattern by adding an anchor prohibiting inner-word matches
Sebastian Nagel created NUTCH-1436:
--
Summary: bin/nutch absent in zip package
Key: NUTCH-1436
URL: https://issues.apache.org/jira/browse/NUTCH-1436
Project: Nutch
Issue Type: Bug
C
[
https://issues.apache.org/jira/browse/NUTCH-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1436:
---
Attachment: NUTCH-1436.patch
Patch for branch-1.5.1 (if a new bin package is desired). For tr
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Attachment: NUTCH-706-2.patch
Second trial for patch. The first one does not remove:
{code}
?_se
Sebastian Nagel created NUTCH-1454:
--
Summary: parsing chm failed
Key: NUTCH-1454
URL: https://issues.apache.org/jira/browse/NUTCH-1454
Project: Nutch
Issue Type: Bug
Components: pa
Sebastian Nagel created NUTCH-1455:
--
Summary: RobotRulesParser to match multi-word user-agent names
Key: NUTCH-1455
URL: https://issues.apache.org/jira/browse/NUTCH-1455
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454282#comment-13454282
]
Sebastian Nagel commented on NUTCH-1467:
Since nutch.metadata.Metadata, NutchField
[
https://issues.apache.org/jira/browse/NUTCH-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-1415:
--
Assignee: Sebastian Nagel
> release packages to contain top level folder apache-nut
[
https://issues.apache.org/jira/browse/NUTCH-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457753#comment-13457753
]
Sebastian Nagel commented on NUTCH-1415:
This has been fixed only for 1.5.1 and 2.
[
https://issues.apache.org/jira/browse/NUTCH-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1415.
Resolution: Fixed
Fix Version/s: 2.1
1.6
committed to trunk (revi
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467990#comment-13467990
]
Sebastian Nagel commented on NUTCH-706:
---
Are there objections to apply and commit the
Sebastian Nagel created NUTCH-1476:
--
Summary: SegmentReader getStats should set parsed = -1 if no
parsing took place
Key: NUTCH-1476
URL: https://issues.apache.org/jira/browse/NUTCH-1476
Project: Nut
[
https://issues.apache.org/jira/browse/NUTCH-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1476:
---
Attachment: NUTCH-1476.patch
> SegmentReader getStats should set parsed = -1 if no parsin
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel reassigned NUTCH-1252:
--
Assignee: Sebastian Nagel
> SegmentReader -get shows wrong data
> -
[
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471915#comment-13471915
]
Sebastian Nagel commented on NUTCH-1344:
Is there any reason why https should be t
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Fix Version/s: 2.2
Summary: Url regex normalizer: default pattern for session id remova
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-706.
---
Resolution: Fixed
committed to trunk (revision 1396796) and 2.x (revision 1396795)
[
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1344.
Resolution: Fixed
Fix Version/s: 2.2
1.6
committed to trunk (revi
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473599#comment-13473599
]
Sebastian Nagel commented on NUTCH-706:
---
First commit erroneously with wrong patch.
C
[
https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474460#comment-13474460
]
Sebastian Nagel commented on NUTCH-1475:
Indeed, a modified time in the future is
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1252.
Resolution: Fixed
committed to trunk (revision 1397281)
> SegmentReader -g
[
https://issues.apache.org/jira/browse/NUTCH-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1476.
Resolution: Fixed
committed to trunk (revision 1397298)
> SegmentReader ge
[
https://issues.apache.org/jira/browse/NUTCH-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1383.
Resolution: Fixed
committed to trunk (revision 1397308)
> IndexingFiltersC
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482644#comment-13482644
]
Sebastian Nagel commented on NUTCH-1467:
Hi Kiran,
thanks for the patch. After a l
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1467:
---
Attachment: NUTCH-1467-TEST-1.patch
> nutch 1.5.1 not able to parse mutliValued metatags
[
https://issues.apache.org/jira/browse/NUTCH-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1421.
Resolution: Fixed
Fix Version/s: 2.2
1.6
committed to trunk (rev.
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1245:
---
Attachment: NUTCH-1245-578-TEST-1.patch
JUnit test to catch this problem and NUTCH-578: a lar
[
https://issues.apache.org/jira/browse/NUTCH-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486144#comment-13486144
]
Sebastian Nagel commented on NUTCH-1482:
+1
> Rename HTMLParseFil
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1245:
---
Attachment: NUTCH-1245-1.patch
FetchSchedule.setPageGoneSchedule is called exclusively for a
[
https://issues.apache.org/jira/browse/NUTCH-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486290#comment-13486290
]
Sebastian Nagel commented on NUTCH-1482:
Markus, you are right: I remember the API
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1245:
---
Attachment: NUTCH-1245-2.patch
NUTCH-1245-578-TEST-2.patch
Improved patches
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486484#comment-13486484
]
Sebastian Nagel commented on NUTCH-578:
---
NUTCH-1245 provides a test to catch this pro
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-578:
--
Attachment: NUTCH-578_v5.patch
> URL fetched with 403 is generated over and over again
> ---
[
https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487316#comment-13487316
]
Sebastian Nagel commented on NUTCH-1370:
+1
Would be nice to see also the number o
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487318#comment-13487318
]
Sebastian Nagel commented on NUTCH-578:
---
Resetting the retry counter in setPageGoneSc
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488146#comment-13488146
]
Sebastian Nagel commented on NUTCH-1483:
Confirmed.
The problem is caused by the r
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1483:
---
Affects Version/s: 1.6
> Can't crawl filesystem with protocol-file plugin
> -
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488200#comment-13488200
]
Sebastian Nagel commented on NUTCH-1483:
I tried with 1.x/trunk.
For 2.x URLs with
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1483:
---
Attachment: NUTCH-1483.patch
StringUtils.split(String, char) does not preserve empty parts: h
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488254#comment-13488254
]
Sebastian Nagel commented on NUTCH-1483:
Rogério, can you apply the patch, re-comp
Sebastian Nagel created NUTCH-1484:
--
Summary: TableUtil unreverseURL fails on file:// URLs
Key: NUTCH-1484
URL: https://issues.apache.org/jira/browse/NUTCH-1484
Project: Nutch
Issue Type: Bu
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488558#comment-13488558
]
Sebastian Nagel commented on NUTCH-1483:
Thanks!
Issue with un-reversing URLs pull
[
https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488558#comment-13488558
]
Sebastian Nagel edited comment on NUTCH-1483 at 11/1/12 8:55 AM:
---
Sebastian Nagel created NUTCH-1485:
--
Summary: TableUtil reverseURL to keep userinfo part
Key: NUTCH-1485
URL: https://issues.apache.org/jira/browse/NUTCH-1485
Project: Nutch
Issue Type: Impr
[
https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488585#comment-13488585
]
Sebastian Nagel commented on NUTCH-1461:
Cf. NUTCH-1484: same error with file:// U
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488935#comment-13488935
]
Sebastian Nagel commented on NUTCH-1245:
They are not duplicates but the effects a
Sebastian Nagel created NUTCH-1488:
--
Summary: bin/nutch to run junit from any directory
Key: NUTCH-1488
URL: https://issues.apache.org/jira/browse/NUTCH-1488
Project: Nutch
Issue Type: Impro
[
https://issues.apache.org/jira/browse/NUTCH-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1488:
---
Attachment: NUTCH-1488.patch
> bin/nutch to run junit from any directory
> --
[
https://issues.apache.org/jira/browse/NUTCH-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494950#comment-13494950
]
Sebastian Nagel commented on NUTCH-1496:
+1
> ParserJob logs skip
[
https://issues.apache.org/jira/browse/NUTCH-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1484:
---
Attachment: NUTCH-1484.patch
Revised patch: replaced
StringUtils.splitByWholeSeparatorPreserv
[
https://issues.apache.org/jira/browse/NUTCH-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494952#comment-13494952
]
Sebastian Nagel edited comment on NUTCH-1484 at 11/11/12 7:56 PM:
--
[
https://issues.apache.org/jira/browse/NUTCH-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1484.
Resolution: Fixed
Committed to 2.x (rev. 1408465)
> TableUtil unreverseURL
[
https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1370:
---
Attachment: NUTCH-1370-1.x.patch
Ferdy is right: custom counters are more transparent.
Patch
[
https://issues.apache.org/jira/browse/NUTCH-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1370:
---
Attachment: NUTCH-1370-2.x-v3.patch
Hi Lewis, yes, the 1.x patch is not easily transferred fo
[
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504136#comment-13504136
]
Sebastian Nagel commented on NUTCH-1499:
Short and precise patch. However, is ther
Sebastian Nagel created NUTCH-1500:
--
Summary: bin/crawl fails on step solrindex with wrong path to
segment
Key: NUTCH-1500
URL: https://issues.apache.org/jira/browse/NUTCH-1500
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1500:
---
Attachment: NUTCH-1500.patch
> bin/crawl fails on step solrindex with wrong path to segme
[
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507944#comment-13507944
]
Sebastian Nagel commented on NUTCH-1499:
Thanks! That's a plausible reason: (let's
[
https://issues.apache.org/jira/browse/NUTCH-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1038:
---
Patch Info: Patch Available
> Port IndexingFiltersChecker to 2.0
> --
[
https://issues.apache.org/jira/browse/NUTCH-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1038:
---
Attachment: NUTCH-1038.patch
> Port IndexingFiltersChecker to 2.0
> -
Sebastian Nagel created NUTCH-1501:
--
Summary: Harmonize behavior of parsechecker and indexchecker
Key: NUTCH-1501
URL: https://issues.apache.org/jira/browse/NUTCH-1501
Project: Nutch
Issue T
Sebastian Nagel created NUTCH-1502:
--
Summary: Test for CrawlDatum state transitions
Key: NUTCH-1502
URL: https://issues.apache.org/jira/browse/NUTCH-1502
Project: Nutch
Issue Type: Improveme
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13525439#comment-13525439
]
Sebastian Nagel commented on NUTCH-1245:
@kiran: yes, 2.x is affected since fetch
[
https://issues.apache.org/jira/browse/NUTCH-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529497#comment-13529497
]
Sebastian Nagel commented on NUTCH-1503:
Hi Lewis,
both time limit properties are
[
https://issues.apache.org/jira/browse/NUTCH-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1038:
---
Attachment: NUTCH-1038v2.patch
Hi Lewis, it's a problem of the patch: the fetch time of a Web
[
https://issues.apache.org/jira/browse/NUTCH-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545480#comment-13545480
]
Sebastian Nagel commented on NUTCH-1514:
+1
But do we need a reference to the remo
[
https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552028#comment-13552028
]
Sebastian Nagel commented on NUTCH-1499:
So, a vote for "won't fix". Comments?
[
https://issues.apache.org/jira/browse/NUTCH-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-813.
---
Resolution: Duplicate
The described problem is identical to that of NUTCH-578. The provided pa
[
https://issues.apache.org/jira/browse/NUTCH-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552082#comment-13552082
]
Sebastian Nagel commented on NUTCH-1345:
JAVA_HOME (or NUTCH_JAVA_HOME) is current
[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554353#comment-13554353
]
Sebastian Nagel commented on NUTCH-1087:
Hi Tristan,
thanks for the patch! The seg
[
https://issues.apache.org/jira/browse/NUTCH-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1500.
Resolution: Fixed
committed to trunk (rev. 1433658)
> bin/crawl fails on s
[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554381#comment-13554381
]
Sebastian Nagel commented on NUTCH-1087:
yes, of course, but currently there is al
[
https://issues.apache.org/jira/browse/NUTCH-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556093#comment-13556093
]
Sebastian Nagel commented on NUTCH-1520:
Hi Markus,
have a look at NUTCH-1113. An
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564274#comment-13564274
]
Sebastian Nagel commented on NUTCH-1465:
Hi Tejas,
thanks and a few comments on th
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564768#comment-13564768
]
Sebastian Nagel commented on NUTCH-1465:
Yes, SitemapInjector is a map-reduce jo
[
https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564827#comment-13564827
]
Sebastian Nagel commented on NUTCH-1047:
As some test for the interface started to
1 - 100 of 3450 matches
Mail list logo