[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922724#comment-13922724 ] Tejas Patil commented on NUTCH-1325: It would take me few weeks before I can work on t

[jira] [Resolved] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-02-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1721. Resolution: Fixed Committed to trunk (rev 1566255) and 2.x (rev 1566257) > Upgrade to Crawler comm

[jira] [Commented] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887784#comment-13887784 ] Tejas Patil commented on NUTCH-1721: Attached patches, all test cases are passing. >

[jira] [Updated] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1721: --- Attachment: NUTCH-1721-2.x.patch NUTCH-1721-trunk.patch > Upgrade to Crawler commons

[jira] [Created] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1721: -- Summary: Upgrade to Crawler commons 0.3 Key: NUTCH-1721 URL: https://issues.apache.org/jira/browse/NUTCH-1721 Project: Nutch Issue Type: Improvement Affects

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887763#comment-13887763 ] Tejas Patil commented on NUTCH-1465: Re "filters and normalizers": +1. Re "fetch inte

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886677#comment-13886677 ] Tejas Patil commented on NUTCH-1465: Interesting comments [~wastl-nagel]. Re "filters

[jira] [Commented] (NUTCH-1718) update description of property http.robots.agent

2014-01-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885650#comment-13885650 ] Tejas Patil commented on NUTCH-1718: Hi [~someuser77], Yup. I am waiting for folks to

[jira] [Updated] (NUTCH-1718) update description of property http.robots.agent

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1718: --- Attachment: NUTCH-1718-trunk.v1.patch Thanks [~wastl-nagel] for bringing this up. I should have updat

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v5.patch Adding new patch 'v5' with below changes: 1. Added Apache licen

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883204#comment-13883204 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Thanks a lot for your comments

[jira] [Commented] (NUTCH-1084) ReadDB url throws exception

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882771#comment-13882771 ] Tejas Patil commented on NUTCH-1084: The issue gets reproduced on current trunk. Attac

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882770#comment-13882770 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I didn't knew about NUTCH-1084 u

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v4.patch Attaching v4 patch with the suggestions #1 and #2 from [~lewism

[jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1692: --- Attachment: 20140126210858.tgz Attaching the test segment (20140126210858.tgz) > SegmentReader broke

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882348#comment-13882348 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I am tried out the patch on a lat

[jira] [Resolved] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1715. Resolution: Fixed The change was verified over nutch-user mailing list. Committed to trunk (revisi

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Attachment: NUTCH-1715.2.x.patch NUTCH-1715.trunk.patch > RobotRulesParser adds addit

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it combine

[jira] [Resolved] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1716. Resolution: Duplicate Accidentally duplicated NUTCH-1715 > RobotRulesParser adds extra '*' to the

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it combine

[jira] [Created] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1716: -- Summary: RobotRulesParser adds extra '*' to the robots name Key: NUTCH-1716 URL: https://issues.apache.org/jira/browse/NUTCH-1716 Project: Nutch Issue Type: Bug

[jira] [Created] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1715: -- Summary: RobotRulesParser adds additional '*' to the robots name Key: NUTCH-1715 URL: https://issues.apache.org/jira/browse/NUTCH-1715 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881143#comment-13881143 ] Tejas Patil commented on NUTCH-1676: Hi [~markus17], I tried out the patch with couple

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880295#comment-13880295 ] Tejas Patil commented on NUTCH-1465: Hi [~lewismc], +1 for the first two suggestions.

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Fix Version/s: (was: 1.9) 1.8 > Support sitemaps in Nutch > --

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880288#comment-13880288 ] Tejas Patil commented on NUTCH-1712: The performance gains due to this patch won't be

[jira] [Resolved] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1164. Resolution: Fixed The patch is better now and all tests pass. It needed little modification: you c

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Attachment: NUTCH-1712-trunk.v1.patch > Use MultipleInputs in Injector to make it a single mapreduce

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Description: Currently Injector creates two mapreduce jobs: 1. sort job: get the urls from seeds file

[jira] [Created] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1712: -- Summary: Use MultipleInputs in Injector to make it a single mapreduce job Key: NUTCH-1712 URL: https://issues.apache.org/jira/browse/NUTCH-1712 Project: Nutch I

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v3.patch Now that HostDb (NUTCH-1365) is in trunk, updated the patch (v3

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878623#comment-13878623 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], Thanks for the correction. This

[jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1164: --- Attachment: TEST-org.apache.nutch.protocol.http.TestProtocolHttp.txt Hi [~Sertac Turkel], I tried out

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1325. Resolution: Fixed Fix Version/s: (was: 1.9) 1.8 Thanks [~markus17] fo

[jira] [Assigned] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1325: -- Assignee: Tejas Patil (was: Markus Jelsma) > HostDB for Nutch > > >

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v2.patch Attaching NUTCH-1465-trunk.v2.patch which has implementation of

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch > HostDB for Nutch > > > Key:

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: (was: NUTCH-1325-trunk-v4.patch) > HostDB for Nutch > > >

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch Attaching NUTCH-1325-trunk-v4.patch with following changes: - F

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875981#comment-13875981 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], I didn't knew about NUTCH-1413. That

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875947#comment-13875947 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], So from 2nd depth onwards, you would

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875910#comment-13875910 ] Tejas Patil commented on NUTCH-1697: Hi [~markus17], Correct me if I am wrong: Hadoop

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875902#comment-13875902 ] Tejas Patil commented on NUTCH-1630: Hi [~icebergx5], How do you obtain the average re

[jira] [Commented] (NUTCH-1680) CrawldbReader to dump minRetry value

2014-01-18 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875687#comment-13875687 ] Tejas Patil commented on NUTCH-1680: +1 > CrawldbReader to dump minRetry value > ---

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-04 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862240#comment-13862240 ] Tejas Patil commented on NUTCH-1325: Could anyone please look at the patch and let us

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862237#comment-13862237 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Yes. I think that it should be

[jira] [Commented] (NUTCH-356) Plugin repository cache can lead to memory leak

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861217#comment-13861217 ] Tejas Patil commented on NUTCH-356: --- +1 for commit. > Plugin repository cache can lead t

[jira] [Commented] (NUTCH-1454) parsing chm failed

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860803#comment-13860803 ] Tejas Patil commented on NUTCH-1454: TIKA-1122 is fixed and I have verified that 'pars

[jira] [Commented] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860678#comment-13860678 ] Tejas Patil commented on NUTCH-1691: Hi [~markus17], Its a good solution. +1 from me.

[jira] [Closed] (NUTCH-1670) set same crawldb directory in mergedb parameter

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-1670. -- Resolution: Fixed Committed the patch by [~amuseme] to trunk (rev 1554883). > set same crawldb directo

[jira] [Updated] (NUTCH-1080) Type safe members , arguments for better readability

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1080: --- Fix Version/s: 1.8 > Type safe members , arguments for better readability >

[jira] [Commented] (NUTCH-1080) Type safe members , arguments for better readability

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860643#comment-13860643 ] Tejas Patil commented on NUTCH-1080: Committed to trunk (rev 1554881). Will port the s

[jira] [Assigned] (NUTCH-1080) Type safe members , arguments for better readability

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1080: -- Assignee: Tejas Patil > Type safe members , arguments for better readability > ---

[jira] [Updated] (NUTCH-1080) Type safe members , arguments for better readability

2014-01-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1080: --- Attachment: NUTCH-1080-tejasp-trunk-v2.patch Attaching a patch for trunk. Uploaded the same over revi

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v3.patch A final patch (NUTCH-1325-trunk-v3.patch) to complete this feat

[jira] [Commented] (NUTCH-1670) set same crawldb directory in mergedb parameter

2014-01-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859987#comment-13859987 ] Tejas Patil commented on NUTCH-1670: Hi [~amuseme.lu], The patch looks good to me. +1

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859358#comment-13859358 ] Tejas Patil commented on NUTCH-1687: Created a review request: https://reviews.apache.

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1687: --- Attachment: NUTCH-1687.tejasp.v1.patch I feel that there is no need for creating a separate class for

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859275#comment-13859275 ] Tejas Patil commented on NUTCH-1687: This is one good point by [~tiennm]. Although th

[jira] [Updated] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1687: --- Fix Version/s: 1.8 > Pick queue in Round Robin > - > > Key: N

[jira] [Commented] (NUTCH-1689) Improve CrawlDb stats

2013-12-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855416#comment-13855416 ] Tejas Patil commented on NUTCH-1689: Some concerns: 1. While you are removing fields f

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848723#comment-13848723 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Nice share. The only grudge I

[jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848723#comment-13848723 ] Tejas Patil edited comment on NUTCH-1465 at 12/16/13 12:09 AM: -

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848561#comment-13848561 ] Tejas Patil commented on NUTCH-1465: Revisited this Jira after a long time and gave a

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2013-12-14 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848517#comment-13848517 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], I stopped by this Jira (after a l

[jira] [Commented] (NUTCH-1577) Add target for creating eclipse project

2013-12-14 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848493#comment-13848493 ] Tejas Patil commented on NUTCH-1577: There was some checkin(s) in past few months whic

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2013-08-11 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736459#comment-13736459 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], > think i've got a slightly ne

[jira] [Commented] (NUTCH-1599) Obtain consensus on new description of Nutch

2013-07-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699115#comment-13699115 ] Tejas Patil commented on NUTCH-1599: I agree with Julien: Nutch should be described as

[jira] [Commented] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699096#comment-13699096 ] Tejas Patil commented on NUTCH-1602: Hi Lufeng, +1 from me too. One minor suggestion:

[jira] [Commented] (NUTCH-1327) QueryStringNormalizer

2013-07-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696840#comment-13696840 ] Tejas Patil commented on NUTCH-1327: Hi Markus, 1. The patch when applied as is didn'

[jira] [Commented] (NUTCH-1126) JUnit test for urlfilter-prefix

2013-06-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692069#comment-13692069 ] Tejas Patil commented on NUTCH-1126: Thanks Talat and Cihad :) One small thing: @auth

[jira] [Commented] (NUTCH-1578) Upgrade to Hadoop 1.2.0

2013-06-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672868#comment-13672868 ] Tejas Patil commented on NUTCH-1578: +1. We should go for this. > Upg

[jira] [Commented] (NUTCH-1577) Add target for creating eclipse project

2013-06-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672650#comment-13672650 ] Tejas Patil commented on NUTCH-1577: Hi [~wastl-nagel], +1 for the suggestion. I have

[jira] [Resolved] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1577. Resolution: Fixed Updated the documentation page [RunNutchInEclipse|http://wiki.apache.org/nutch/R

[jira] [Commented] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671823#comment-13671823 ] Tejas Patil commented on NUTCH-1577: Committed to trunk at rev1488396. My next task i

[jira] [Updated] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1577: --- Attachment: NUTCH-1577.2.x.patch Patch for 2.x > Add target for creating eclipse pro

[jira] [Comment Edited] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671305#comment-13671305 ] Tejas Patil edited comment on NUTCH-1577 at 5/31/13 10:22 AM: --

[jira] [Updated] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1577: --- Attachment: NUTCH-1577.trunk.patch Here is a patch for trunk. How to use it: * on a SVN checkout of t

[jira] [Comment Edited] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671305#comment-13671305 ] Tejas Patil edited comment on NUTCH-1577 at 5/31/13 10:19 AM: --

[jira] [Created] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1577: -- Summary: Add target for creating eclipse project Key: NUTCH-1577 URL: https://issues.apache.org/jira/browse/NUTCH-1577 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665267#comment-13665267 ] Tejas Patil commented on NUTCH-1563: You pushed it at the right place [~amuseme] :) If

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664408#comment-13664408 ] Tejas Patil commented on NUTCH-1563: I think this is relevant to only 2.x and [~amusem

[jira] [Resolved] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1249. Resolution: Fixed Fix Version/s: 2.2 Assignee: Tejas Patil (was: Lewis John McGibbn

[jira] [Resolved] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1275. Resolution: Fixed Fix Version/s: 2.2 Got resolved with NUTCH-1249 > Fix [un

[jira] [Commented] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663792#comment-13663792 ] Tejas Patil commented on NUTCH-1275: Hi [~lewismc], I am working on a patch for 2.x.

[jira] [Assigned] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1275: -- Assignee: Tejas Patil > Fix [unchecked] javac warnings > -- > >

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662571#comment-13662571 ] Tejas Patil commented on NUTCH-1569: If using some other backend would be an overkill,

[jira] [Resolved] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1513. Resolution: Fixed Committed to trunk (rev 1484638) and 2.x (rev 1484637) > Support

[jira] [Commented] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662554#comment-13662554 ] Tejas Patil commented on NUTCH-1275: Committed to trunk @ revision 1484634. For patch

[jira] [Commented] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662553#comment-13662553 ] Tejas Patil commented on NUTCH-1249: Committed to trunk @ revision 1484634

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662540#comment-13662540 ] Tejas Patil commented on NUTCH-1569: Hey Lewis, I took a fresh checkout of 2.x and app

[jira] [Resolved] (NUTCH-1053) Parsing of RSS feeds fails

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1053. Resolution: Fixed Committed to trunk (rev 1484628) and 2.x (rev 1484627). NOTE : Currently feeds p

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662497#comment-13662497 ] Tejas Patil commented on NUTCH-1569: I am running 2.x with this patch since past few h

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662230#comment-13662230 ] Tejas Patil commented on NUTCH-1545: +1 for commit. > capture batchId

[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661663#comment-13661663 ] Tejas Patil commented on NUTCH-1573: Oh... just saw your comment that you have committ

[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661662#comment-13661662 ] Tejas Patil commented on NUTCH-1573: [~lewismc] great !! Only if there were no homewo

[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-18 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661450#comment-13661450 ] Tejas Patil commented on NUTCH-1573: Hi Lewis, Quick question: Besides modifying the

[jira] [Comment Edited] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661264#comment-13661264 ] Tejas Patil edited comment on NUTCH-1566 at 5/18/13 4:05 AM: -

[jira] [Commented] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661264#comment-13661264 ] Tejas Patil commented on NUTCH-1566: Hi Seb, I tried the patch over a windows machine

  1   2   3   >