[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875902#comment-13875902 ] Tejas Patil commented on NUTCH-1630: Hi [~icebergx5], How do you obtain the average re

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875910#comment-13875910 ] Tejas Patil commented on NUTCH-1697: Hi [~markus17], Correct me if I am wrong: Hadoop

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875947#comment-13875947 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], So from 2nd depth onwards, you would

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875981#comment-13875981 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], I didn't knew about NUTCH-1413. That

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch Attaching NUTCH-1325-trunk-v4.patch with following changes: - F

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: (was: NUTCH-1325-trunk-v4.patch) > HostDB for Nutch > > >

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch > HostDB for Nutch > > > Key:

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v2.patch Attaching NUTCH-1465-trunk.v2.patch which has implementation of

[jira] [Assigned] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1325: -- Assignee: Tejas Patil (was: Markus Jelsma) > HostDB for Nutch > > >

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1325. Resolution: Fixed Fix Version/s: (was: 1.9) 1.8 Thanks [~markus17] fo

[jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1164: --- Attachment: TEST-org.apache.nutch.protocol.http.TestProtocolHttp.txt Hi [~Sertac Turkel], I tried out

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878623#comment-13878623 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], Thanks for the correction. This

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v3.patch Now that HostDb (NUTCH-1365) is in trunk, updated the patch (v3

[jira] [Created] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1712: -- Summary: Use MultipleInputs in Injector to make it a single mapreduce job Key: NUTCH-1712 URL: https://issues.apache.org/jira/browse/NUTCH-1712 Project: Nutch I

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Description: Currently Injector creates two mapreduce jobs: 1. sort job: get the urls from seeds file

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Attachment: NUTCH-1712-trunk.v1.patch > Use MultipleInputs in Injector to make it a single mapreduce

[jira] [Resolved] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1164. Resolution: Fixed The patch is better now and all tests pass. It needed little modification: you c

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880288#comment-13880288 ] Tejas Patil commented on NUTCH-1712: The performance gains due to this patch won't be

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Fix Version/s: (was: 1.9) 1.8 > Support sitemaps in Nutch > --

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880295#comment-13880295 ] Tejas Patil commented on NUTCH-1465: Hi [~lewismc], +1 for the first two suggestions.

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881143#comment-13881143 ] Tejas Patil commented on NUTCH-1676: Hi [~markus17], I tried out the patch with couple

[jira] [Created] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1715: -- Summary: RobotRulesParser adds additional '*' to the robots name Key: NUTCH-1715 URL: https://issues.apache.org/jira/browse/NUTCH-1715 Project: Nutch Issue Type:

[jira] [Created] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1716: -- Summary: RobotRulesParser adds extra '*' to the robots name Key: NUTCH-1716 URL: https://issues.apache.org/jira/browse/NUTCH-1716 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it combine

[jira] [Resolved] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1716. Resolution: Duplicate Accidentally duplicated NUTCH-1715 > RobotRulesParser adds extra '*' to the

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it combine

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Attachment: NUTCH-1715.2.x.patch NUTCH-1715.trunk.patch > RobotRulesParser adds addit

[jira] [Resolved] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1715. Resolution: Fixed The change was verified over nutch-user mailing list. Committed to trunk (revisi

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882348#comment-13882348 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I am tried out the patch on a lat

[jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1692: --- Attachment: 20140126210858.tgz Attaching the test segment (20140126210858.tgz) > SegmentReader broke

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v4.patch Attaching v4 patch with the suggestions #1 and #2 from [~lewism

[jira] [Commented] (NUTCH-1084) ReadDB url throws exception

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882771#comment-13882771 ] Tejas Patil commented on NUTCH-1084: The issue gets reproduced on current trunk. Attac

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882770#comment-13882770 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I didn't knew about NUTCH-1084 u

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883204#comment-13883204 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Thanks a lot for your comments

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v5.patch Adding new patch 'v5' with below changes: 1. Added Apache licen

[jira] [Updated] (NUTCH-1718) update description of property http.robots.agent

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1718: --- Attachment: NUTCH-1718-trunk.v1.patch Thanks [~wastl-nagel] for bringing this up. I should have updat

[jira] [Commented] (NUTCH-1718) update description of property http.robots.agent

2014-01-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885650#comment-13885650 ] Tejas Patil commented on NUTCH-1718: Hi [~someuser77], Yup. I am waiting for folks to

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886677#comment-13886677 ] Tejas Patil commented on NUTCH-1465: Interesting comments [~wastl-nagel]. Re "filters

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887763#comment-13887763 ] Tejas Patil commented on NUTCH-1465: Re "filters and normalizers": +1. Re "fetch inte

[jira] [Created] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1721: -- Summary: Upgrade to Crawler commons 0.3 Key: NUTCH-1721 URL: https://issues.apache.org/jira/browse/NUTCH-1721 Project: Nutch Issue Type: Improvement Affects

[jira] [Updated] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1721: --- Attachment: NUTCH-1721-2.x.patch NUTCH-1721-trunk.patch > Upgrade to Crawler commons

[jira] [Commented] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887784#comment-13887784 ] Tejas Patil commented on NUTCH-1721: Attached patches, all test cases are passing. >

[jira] [Resolved] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-02-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1721. Resolution: Fixed Committed to trunk (rev 1566255) and 2.x (rev 1566257) > Upgrade to Crawler comm

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922724#comment-13922724 ] Tejas Patil commented on NUTCH-1325: It would take me few weeks before I can work on t

<    1   2   3