[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567037#comment-14567037 ] Sebastian Nagel commented on NUTCH-2015: +1 to commit [~sujenshah]'s latest patch - tests pass, should avoid the high memory usage (I didn't really test it ;)) - to be formatted using [eclipse-codeformat.xml|http://svn.apache.org/viewvc/nutch/branches/2.x/eclipse-codeformat.xml] Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568497#comment-14568497 ] Chris A. Mattmann commented on NUTCH-2015: -- Thanks [~wastl-nagel] committing now! Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568500#comment-14568500 ] ASF GitHub Bot commented on NUTCH-2015: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/25 Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568517#comment-14568517 ] Hudson commented on NUTCH-2015: --- SUCCESS: Integrated in Nutch-trunk #3147 (See [https://builds.apache.org/job/Nutch-trunk/3147/]) - fix for NUTCH-2015 Make FetchNodeDb optional (off by default) if NutchServer is not used contributed by Sujen Shah this closes #25 (mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1683039) * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherThread.java Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565606#comment-14565606 ] Chris A. Mattmann commented on NUTCH-2015: -- Hi [~asitang] checking in - where are we on this? Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565668#comment-14565668 ] Sujen Shah commented on NUTCH-2015: --- Hi [~wastl-nagel], I updated the code as you suggested. Have put the parsing and the server state check outside the loop and created new FetchNodes only if both conditions are true. Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565626#comment-14565626 ] Sujen Shah commented on NUTCH-2015: --- Hi [~chrismattmann], I am testing the changes, will update the PR in sometime. Thanks. Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560874#comment-14560874 ] Chris A. Mattmann commented on NUTCH-2015: -- pinging again here [~asitang] and [~sujenshah] any updates here? Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561434#comment-14561434 ] Asitang Mishra commented on NUTCH-2015: --- Hi [~chrismattmann], I think [~sujenshah] is already on this task and might even have a patch ready, as it's a small one. I am just waiting for him to update this (he will reach today). Let's hope he does this withing a couple of days or else I will do it. Meanwhile I am working on the crawl API and crawlDB API right now. Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557467#comment-14557467 ] Chris A. Mattmann commented on NUTCH-2015: -- Ping Sujen, Asitang, did you guys get a chance to make updates here? Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551632#comment-14551632 ] Chris A. Mattmann commented on NUTCH-2015: -- Sujen can you also please update the wiki for Nutch to include more REST API docs? https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Assignee: Chris A. Mattmann Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551447#comment-14551447 ] Chris A. Mattmann commented on NUTCH-2015: -- [~sujenshah] did you make the changes? Can we commit the pull request? Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547239#comment-14547239 ] Sujen Shah commented on NUTCH-2015: --- Yes true, you are right. Will make the necessary changes. Thanks! Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547213#comment-14547213 ] Sebastian Nagel commented on NUTCH-2015: Ok. Ev. this could be changed to make it more clear: a check isRunning() inside a loop suggests that the state of the server can change between running and stopped. That's (currently) not the case and a boolean field variable seems to more verbose. In case the server could be stopped it should be: {code} if (NutchServer.getInstance().isRunning()) this.fetchNode = new FetchNode(); else this.fetchNode = null; {code} Since currently fetchNode is only used with a parsing fetcher this could be also checked, e.g. {code} if (parsing NutchServer.getInstance().isRunning()) { reportToNutchServer = true; } {code} Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546646#comment-14546646 ] ASF GitHub Bot commented on NUTCH-2015: --- GitHub user sujen1412 opened a pull request: https://github.com/apache/nutch/pull/25 fix for NUTCH-2015 contributed by Sujen Shah You can merge this pull request into a Git repository by running: $ git pull https://github.com/sujen1412/nutch NUTCH-2015 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 5a6766587c8490e1c7b31eae54bce91b4411a3fa Author: Sujen Shah sujen1...@gmail.com Date: 2015-05-16T08:32:36Z Creation of FetchNodes is off by default if NutchServer is not used Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546648#comment-14546648 ] Sujen Shah commented on NUTCH-2015: --- PR link - https://github.com/apache/nutch/pull/25 Make FetchNodeDb optional (off by default) if NutchServer is not used - Key: NUTCH-2015 URL: https://issues.apache.org/jira/browse/NUTCH-2015 Project: Nutch Issue Type: Sub-task Components: fetcher, REST_api Reporter: Sujen Shah Labels: memex Fix For: 1.11 Currently, the FetchNodes are created even if the NutchServer is not used causing memory exceptions. This patch makes the fetcher report to the FetchNodeDb only if the crawl is invoked from the REST service (ie NutchServer) -- This message was sent by Atlassian JIRA (v6.3.4#6332)