[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922724#comment-13922724 ] Tejas Patil commented on NUTCH-1325: It would take me few weeks before I can work on

[jira] [Resolved] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-02-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1721. Resolution: Fixed Committed to trunk (rev 1566255) and 2.x (rev 1566257) Upgrade to Crawler

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887763#comment-13887763 ] Tejas Patil commented on NUTCH-1465: Re filters and normalizers: +1. Re fetch

[jira] [Created] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1721: -- Summary: Upgrade to Crawler commons 0.3 Key: NUTCH-1721 URL: https://issues.apache.org/jira/browse/NUTCH-1721 Project: Nutch Issue Type: Improvement Affects

[jira] [Updated] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1721: --- Attachment: NUTCH-1721-2.x.patch NUTCH-1721-trunk.patch Upgrade to Crawler commons

[jira] [Commented] (NUTCH-1721) Upgrade to Crawler commons 0.3

2014-01-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887784#comment-13887784 ] Tejas Patil commented on NUTCH-1721: Attached patches, all test cases are passing.

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886677#comment-13886677 ] Tejas Patil commented on NUTCH-1465: Interesting comments [~wastl-nagel]. Re filters

[jira] [Commented] (NUTCH-1718) update description of property http.robots.agent

2014-01-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885650#comment-13885650 ] Tejas Patil commented on NUTCH-1718: Hi [~someuser77], Yup. I am waiting for folks to

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v5.patch Adding new patch 'v5' with below changes: 1. Added Apache

[jira] [Updated] (NUTCH-1718) update description of property http.robots.agent

2014-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1718: --- Attachment: NUTCH-1718-trunk.v1.patch Thanks [~wastl-nagel] for bringing this up. I should have

[jira] [Commented] (NUTCH-1084) ReadDB url throws exception

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882771#comment-13882771 ] Tejas Patil commented on NUTCH-1084: The issue gets reproduced on current trunk.

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882770#comment-13882770 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I didn't knew about NUTCH-1084

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883204#comment-13883204 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Thanks a lot for your

[jira] [Commented] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882348#comment-13882348 ] Tejas Patil commented on NUTCH-1692: Hi [~markus17], I am tried out the patch on a

[jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1692: --- Attachment: 20140126210858.tgz Attaching the test segment (20140126210858.tgz) SegmentReader

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-26 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v4.patch Attaching v4 patch with the suggestions #1 and #2 from

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881143#comment-13881143 ] Tejas Patil commented on NUTCH-1676: Hi [~markus17], I tried out the patch with couple

[jira] [Created] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1715: -- Summary: RobotRulesParser adds additional '*' to the robots name Key: NUTCH-1715 URL: https://issues.apache.org/jira/browse/NUTCH-1715 Project: Nutch Issue

[jira] [Created] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1716: -- Summary: RobotRulesParser adds extra '*' to the robots name Key: NUTCH-1716 URL: https://issues.apache.org/jira/browse/NUTCH-1716 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it

[jira] [Resolved] (NUTCH-1716) RobotRulesParser adds extra '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1716. Resolution: Duplicate Accidentally duplicated NUTCH-1715 RobotRulesParser adds extra '*' to the

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Description: In RobotRulesParser, when Nutch creates a agent string from multiple agents, it

[jira] [Updated] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1715: --- Attachment: NUTCH-1715.2.x.patch NUTCH-1715.trunk.patch RobotRulesParser adds

[jira] [Resolved] (NUTCH-1715) RobotRulesParser adds additional '*' to the robots name

2014-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1715. Resolution: Fixed The change was verified over nutch-user mailing list. Committed to trunk

[jira] [Created] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1712: -- Summary: Use MultipleInputs in Injector to make it a single mapreduce job Key: NUTCH-1712 URL: https://issues.apache.org/jira/browse/NUTCH-1712 Project: Nutch

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Description: Currently Injector creates two mapreduce jobs: 1. sort job: get the urls from seeds

[jira] [Updated] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1712: --- Attachment: NUTCH-1712-trunk.v1.patch Use MultipleInputs in Injector to make it a single mapreduce

[jira] [Resolved] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1164. Resolution: Fixed The patch is better now and all tests pass. It needed little modification: you

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880288#comment-13880288 ] Tejas Patil commented on NUTCH-1712: The performance gains due to this patch won't be

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Fix Version/s: (was: 1.9) 1.8 Support sitemaps in Nutch

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880295#comment-13880295 ] Tejas Patil commented on NUTCH-1465: Hi [~lewismc], +1 for the first two suggestions.

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1325. Resolution: Fixed Fix Version/s: (was: 1.9) 1.8 Thanks [~markus17]

[jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1164: --- Attachment: TEST-org.apache.nutch.protocol.http.TestProtocolHttp.txt Hi [~Sertac Turkel], I tried

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878623#comment-13878623 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], Thanks for the correction. This

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v3.patch Now that HostDb (NUTCH-1365) is in trunk, updated the patch

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch Attaching NUTCH-1325-trunk-v4.patch with following changes: -

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: (was: NUTCH-1325-trunk-v4.patch) HostDB for Nutch

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v4.patch HostDB for Nutch Key:

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v2.patch Attaching NUTCH-1465-trunk.v2.patch which has implementation

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875902#comment-13875902 ] Tejas Patil commented on NUTCH-1630: Hi [~icebergx5], How do you obtain the average

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875910#comment-13875910 ] Tejas Patil commented on NUTCH-1697: Hi [~markus17], Correct me if I am wrong: Hadoop

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875947#comment-13875947 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], So from 2nd depth onwards, you would

[jira] [Commented] (NUTCH-1630) How to achieve finishing fetch approximately at the same time for each queue (a.k.a adaptive queue size)

2014-01-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875981#comment-13875981 ] Tejas Patil commented on NUTCH-1630: Hi [~talat], I didn't knew about NUTCH-1413. That

[jira] [Commented] (NUTCH-1680) CrawldbReader to dump minRetry value

2014-01-18 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875687#comment-13875687 ] Tejas Patil commented on NUTCH-1680: +1 CrawldbReader to dump minRetry value

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-04 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862240#comment-13862240 ] Tejas Patil commented on NUTCH-1325: Could anyone please look at the patch and let us

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2014-01-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862237#comment-13862237 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Yes. I think that it should be

[jira] [Commented] (NUTCH-1080) Type safe members , arguments for better readability

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860643#comment-13860643 ] Tejas Patil commented on NUTCH-1080: Committed to trunk (rev 1554881). Will port the

[jira] [Commented] (NUTCH-1691) DomainBlacklist url filter does not allow -D filter file override

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860678#comment-13860678 ] Tejas Patil commented on NUTCH-1691: Hi [~markus17], Its a good solution. +1 from me.

[jira] [Commented] (NUTCH-1454) parsing chm failed

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860803#comment-13860803 ] Tejas Patil commented on NUTCH-1454: TIKA-1122 is fixed and I have verified that

[jira] [Commented] (NUTCH-356) Plugin repository cache can lead to memory leak

2014-01-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861217#comment-13861217 ] Tejas Patil commented on NUTCH-356: --- +1 for commit. Plugin repository cache can lead to

[jira] [Commented] (NUTCH-1670) set same crawldb directory in mergedb parameter

2014-01-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859987#comment-13859987 ] Tejas Patil commented on NUTCH-1670: Hi [~amuseme.lu], The patch looks good to me. +1

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2014-01-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325-trunk-v3.patch A final patch (NUTCH-1325-trunk-v3.patch) to complete this

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859275#comment-13859275 ] Tejas Patil commented on NUTCH-1687: This is one good point by [~tiennm]. Although

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2013-12-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859358#comment-13859358 ] Tejas Patil commented on NUTCH-1687: Created a review request:

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848561#comment-13848561 ] Tejas Patil commented on NUTCH-1465: Revisited this Jira after a long time and gave a

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848723#comment-13848723 ] Tejas Patil commented on NUTCH-1465: Hi [~wastl-nagel], Nice share. The only grudge I

[jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch

2013-12-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848723#comment-13848723 ] Tejas Patil edited comment on NUTCH-1465 at 12/16/13 12:09 AM:

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2013-12-14 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848517#comment-13848517 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], I stopped by this Jira (after a

[jira] [Commented] (NUTCH-1602) improve the readability of metadata in readdb dump normal

2013-07-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699096#comment-13699096 ] Tejas Patil commented on NUTCH-1602: Hi Lufeng, +1 from me too. One minor suggestion:

[jira] [Commented] (NUTCH-1599) Obtain consensus on new description of Nutch

2013-07-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699115#comment-13699115 ] Tejas Patil commented on NUTCH-1599: I agree with Julien: Nutch should be described as

[jira] [Commented] (NUTCH-1327) QueryStringNormalizer

2013-07-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696840#comment-13696840 ] Tejas Patil commented on NUTCH-1327: Hi Markus, 1. The patch when applied as is

[jira] [Commented] (NUTCH-1126) JUnit test for urlfilter-prefix

2013-06-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692069#comment-13692069 ] Tejas Patil commented on NUTCH-1126: Thanks Talat and Cihad :) One small thing:

[jira] [Commented] (NUTCH-1578) Upgrade to Hadoop 1.2.0

2013-06-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672868#comment-13672868 ] Tejas Patil commented on NUTCH-1578: +1. We should go for this.

[jira] [Commented] (NUTCH-1577) Add target for creating eclipse project

2013-06-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672650#comment-13672650 ] Tejas Patil commented on NUTCH-1577: Hi [~wastl-nagel], +1 for the suggestion. I have

[jira] [Created] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1577: -- Summary: Add target for creating eclipse project Key: NUTCH-1577 URL: https://issues.apache.org/jira/browse/NUTCH-1577 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1577: --- Attachment: NUTCH-1577.trunk.patch Here is a patch for trunk. How to use it: * on a SVN checkout of

[jira] [Comment Edited] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671305#comment-13671305 ] Tejas Patil edited comment on NUTCH-1577 at 5/31/13 10:19 AM: --

[jira] [Comment Edited] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671305#comment-13671305 ] Tejas Patil edited comment on NUTCH-1577 at 5/31/13 10:22 AM: --

[jira] [Updated] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1577: --- Attachment: NUTCH-1577.2.x.patch Patch for 2.x Add target for creating eclipse

[jira] [Commented] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671823#comment-13671823 ] Tejas Patil commented on NUTCH-1577: Committed to trunk at rev1488396. My next task

[jira] [Resolved] (NUTCH-1577) Add target for creating eclipse project

2013-05-31 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1577. Resolution: Fixed Updated the documentation page

[jira] [Resolved] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1249. Resolution: Fixed Fix Version/s: 2.2 Assignee: Tejas Patil (was: Lewis John

[jira] [Resolved] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1275. Resolution: Fixed Fix Version/s: 2.2 Got resolved with NUTCH-1249 Fix

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664408#comment-13664408 ] Tejas Patil commented on NUTCH-1563: I think this is relevant to only 2.x and

[jira] [Assigned] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1275: -- Assignee: Tejas Patil Fix [unchecked] javac warnings --

[jira] [Commented] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663792#comment-13663792 ] Tejas Patil commented on NUTCH-1275: Hi [~lewismc], I am working on a patch for 2.x.

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662230#comment-13662230 ] Tejas Patil commented on NUTCH-1545: +1 for commit. capture batchId

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662497#comment-13662497 ] Tejas Patil commented on NUTCH-1569: I am running 2.x with this patch since past few

[jira] [Resolved] (NUTCH-1053) Parsing of RSS feeds fails

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1053. Resolution: Fixed Committed to trunk (rev 1484628) and 2.x (rev 1484627). NOTE : Currently feeds

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662540#comment-13662540 ] Tejas Patil commented on NUTCH-1569: Hey Lewis, I took a fresh checkout of 2.x and

[jira] [Commented] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662554#comment-13662554 ] Tejas Patil commented on NUTCH-1275: Committed to trunk @ revision 1484634. For patch

[jira] [Commented] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662553#comment-13662553 ] Tejas Patil commented on NUTCH-1249: Committed to trunk @ revision 1484634

[jira] [Resolved] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1513. Resolution: Fixed Committed to trunk (rev 1484638) and 2.x (rev 1484637) Support

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662571#comment-13662571 ] Tejas Patil commented on NUTCH-1569: If using some other backend would be an overkill,

[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661662#comment-13661662 ] Tejas Patil commented on NUTCH-1573: [~lewismc] great !! Only if there were no

[jira] [Commented] (NUTCH-1573) Upgrade to most recent JUnit 4.x to improve test flexibility

2013-05-18 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661450#comment-13661450 ] Tejas Patil commented on NUTCH-1573: Hi Lewis, Quick question: Besides modifying the

[jira] [Commented] (NUTCH-1469) Upgrade commons-net dependency

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661078#comment-13661078 ] Tejas Patil commented on NUTCH-1469: Hi Lewis, I checked that merely updating the

[jira] [Updated] (NUTCH-1469) Upgrade commons-net dependency

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1469: --- Labels: ftp ftpclient (was: ) Upgrade commons-net dependency --

[jira] [Commented] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661264#comment-13661264 ] Tejas Patil commented on NUTCH-1566: Hi Seb, I tried the patch over a windows machine

[jira] [Comment Edited] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661264#comment-13661264 ] Tejas Patil edited comment on NUTCH-1566 at 5/18/13 4:05 AM: -

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655807#comment-13655807 ] Tejas Patil commented on NUTCH-1545: I dont fully understand the significance of

[jira] [Updated] (NUTCH-1053) Parsing of RSS feeds fails

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1053: --- Attachment: NUTCH-1053.trunk.patch A tiny change in ivy file for feeds plugin fixes the problem.

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325.trunk.v2.path Hi [~markus17], The initial patch is good. This feature would

[jira] [Resolved] (NUTCH-1418) error parsing robots rules- can't decode path: /wiki/Wikipedia%3Mediation_Committee/

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1418. Resolution: Fixed Fix Version/s: 2.2 After the robots handling has been delegated to

[jira] [Commented] (NUTCH-1243) Junit jar removed from lib

2013-05-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654755#comment-13654755 ] Tejas Patil commented on NUTCH-1243: Hi [~jnioche], looks like you have fixed the

[jira] [Comment Edited] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-05-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652805#comment-13652805 ] Tejas Patil edited comment on NUTCH-1031 at 5/9/13 7:52 AM: I

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-05-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652805#comment-13652805 ] Tejas Patil commented on NUTCH-1031: I had forgot to add crawler-commons dependency in

[jira] [Updated] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1513: --- Attachment: NUTCH-1513.2.x.v2.patch NUTCH-1513.trunk.v2.patch Attached the patches

[jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1249: --- Attachment: NUTCH-1249.trunk.patch Here is a mega patch for trunk which addresses all warnings(there

[jira] [Closed] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implment

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-427. - Resolution: Won't Fix Patch uses JCIFS which is licensed under LGPL. So it cannot be included in Nutch

  1   2   3   >