[
https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875902#comment-13875902
]
Tejas Patil commented on NUTCH-1630:
Hi [~icebergx5],
How do you obtain the average re
[
https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875910#comment-13875910
]
Tejas Patil commented on NUTCH-1697:
Hi [~markus17],
Correct me if I am wrong: Hadoop
[
https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875947#comment-13875947
]
Tejas Patil commented on NUTCH-1630:
Hi [~talat],
So from 2nd depth onwards, you would
[
https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875981#comment-13875981
]
Tejas Patil commented on NUTCH-1630:
Hi [~talat],
I didn't knew about NUTCH-1413. That
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1325:
---
Attachment: NUTCH-1325-trunk-v4.patch
Attaching NUTCH-1325-trunk-v4.patch with following changes:
- F
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1325:
---
Attachment: (was: NUTCH-1325-trunk-v4.patch)
> HostDB for Nutch
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1325:
---
Attachment: NUTCH-1325-trunk-v4.patch
> HostDB for Nutch
>
>
> Key:
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Attachment: NUTCH-1465-trunk.v2.patch
Attaching NUTCH-1465-trunk.v2.patch which has implementation of
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil reassigned NUTCH-1325:
--
Assignee: Tejas Patil (was: Markus Jelsma)
> HostDB for Nutch
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1325.
Resolution: Fixed
Fix Version/s: (was: 1.9)
1.8
Thanks [~markus17] fo
[
https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1164:
---
Attachment: TEST-org.apache.nutch.protocol.http.TestProtocolHttp.txt
Hi [~Sertac Turkel],
I tried out
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878623#comment-13878623
]
Tejas Patil commented on NUTCH-1325:
Hi [~markus17],
Thanks for the correction. This
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Attachment: NUTCH-1465-trunk.v3.patch
Now that HostDb (NUTCH-1365) is in trunk, updated the patch (v3
Tejas Patil created NUTCH-1712:
--
Summary: Use MultipleInputs in Injector to make it a single
mapreduce job
Key: NUTCH-1712
URL: https://issues.apache.org/jira/browse/NUTCH-1712
Project: Nutch
I
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1712:
---
Description:
Currently Injector creates two mapreduce jobs:
1. sort job: get the urls from seeds file
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1712:
---
Attachment: NUTCH-1712-trunk.v1.patch
> Use MultipleInputs in Injector to make it a single mapreduce
[
https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1164.
Resolution: Fixed
The patch is better now and all tests pass. It needed little modification: you
c
[
https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880288#comment-13880288
]
Tejas Patil commented on NUTCH-1712:
The performance gains due to this patch won't be
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Fix Version/s: (was: 1.9)
1.8
> Support sitemaps in Nutch
> --
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880295#comment-13880295
]
Tejas Patil commented on NUTCH-1465:
Hi [~lewismc],
+1 for the first two suggestions.
[
https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881143#comment-13881143
]
Tejas Patil commented on NUTCH-1676:
Hi [~markus17],
I tried out the patch with couple
Tejas Patil created NUTCH-1715:
--
Summary: RobotRulesParser adds additional '*' to the robots name
Key: NUTCH-1715
URL: https://issues.apache.org/jira/browse/NUTCH-1715
Project: Nutch
Issue Type:
Tejas Patil created NUTCH-1716:
--
Summary: RobotRulesParser adds extra '*' to the robots name
Key: NUTCH-1716
URL: https://issues.apache.org/jira/browse/NUTCH-1716
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1715:
---
Description:
In RobotRulesParser, when Nutch creates a agent string from multiple agents, it
combine
[
https://issues.apache.org/jira/browse/NUTCH-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1716.
Resolution: Duplicate
Accidentally duplicated NUTCH-1715
> RobotRulesParser adds extra '*' to the
[
https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1715:
---
Description:
In RobotRulesParser, when Nutch creates a agent string from multiple agents, it
combine
[
https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1715:
---
Attachment: NUTCH-1715.2.x.patch
NUTCH-1715.trunk.patch
> RobotRulesParser adds addit
[
https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1715.
Resolution: Fixed
The change was verified over nutch-user mailing list. Committed to trunk
(revisi
[
https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882348#comment-13882348
]
Tejas Patil commented on NUTCH-1692:
Hi [~markus17],
I am tried out the patch on a lat
[
https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1692:
---
Attachment: 20140126210858.tgz
Attaching the test segment (20140126210858.tgz)
> SegmentReader broke
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Attachment: NUTCH-1465-trunk.v4.patch
Attaching v4 patch with the suggestions #1 and #2 from [~lewism
[
https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882771#comment-13882771
]
Tejas Patil commented on NUTCH-1084:
The issue gets reproduced on current trunk. Attac
[
https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882770#comment-13882770
]
Tejas Patil commented on NUTCH-1692:
Hi [~markus17],
I didn't knew about NUTCH-1084 u
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883204#comment-13883204
]
Tejas Patil commented on NUTCH-1465:
Hi [~wastl-nagel],
Thanks a lot for your comments
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1465:
---
Attachment: NUTCH-1465-trunk.v5.patch
Adding new patch 'v5' with below changes:
1. Added Apache licen
[
https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1718:
---
Attachment: NUTCH-1718-trunk.v1.patch
Thanks [~wastl-nagel] for bringing this up. I should have updat
[
https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885650#comment-13885650
]
Tejas Patil commented on NUTCH-1718:
Hi [~someuser77], Yup. I am waiting for folks to
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886677#comment-13886677
]
Tejas Patil commented on NUTCH-1465:
Interesting comments [~wastl-nagel].
Re "filters
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887763#comment-13887763
]
Tejas Patil commented on NUTCH-1465:
Re "filters and normalizers": +1.
Re "fetch inte
Tejas Patil created NUTCH-1721:
--
Summary: Upgrade to Crawler commons 0.3
Key: NUTCH-1721
URL: https://issues.apache.org/jira/browse/NUTCH-1721
Project: Nutch
Issue Type: Improvement
Affects
[
https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil updated NUTCH-1721:
---
Attachment: NUTCH-1721-2.x.patch
NUTCH-1721-trunk.patch
> Upgrade to Crawler commons
[
https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887784#comment-13887784
]
Tejas Patil commented on NUTCH-1721:
Attached patches, all test cases are passing.
>
[
https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tejas Patil resolved NUTCH-1721.
Resolution: Fixed
Committed to trunk (rev 1566255) and 2.x (rev 1566257)
> Upgrade to Crawler comm
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922724#comment-13922724
]
Tejas Patil commented on NUTCH-1325:
It would take me few weeks before I can work on t
201 - 244 of 244 matches
Mail list logo