[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567037#comment-14567037
 ] 

Sebastian Nagel commented on NUTCH-2015:


+1 to commit [~sujenshah]'s latest patch
- tests pass, should avoid the high memory usage (I didn't really test it ;))
- to be formatted using 
[eclipse-codeformat.xml|http://svn.apache.org/viewvc/nutch/branches/2.x/eclipse-codeformat.xml]

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568497#comment-14568497
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

Thanks [~wastl-nagel] committing now!

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568500#comment-14568500
 ] 

ASF GitHub Bot commented on NUTCH-2015:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/25


 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568517#comment-14568517
 ] 

Hudson commented on NUTCH-2015:
---

SUCCESS: Integrated in Nutch-trunk #3147 (See 
[https://builds.apache.org/job/Nutch-trunk/3147/])
- fix for NUTCH-2015 Make FetchNodeDb optional (off by default) if NutchServer 
is not used contributed by Sujen Shah this closes #25 (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1683039)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherThread.java


 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-29 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565606#comment-14565606
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

Hi [~asitang] checking in - where are we on this?

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-29 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565668#comment-14565668
 ] 

Sujen Shah commented on NUTCH-2015:
---

Hi [~wastl-nagel], 
I updated the code as you suggested. Have put the parsing and the server state 
check outside the loop and created new FetchNodes only if both conditions are 
true. 

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-29 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565626#comment-14565626
 ] 

Sujen Shah commented on NUTCH-2015:
---

Hi [~chrismattmann], I am testing the changes, will update the PR in sometime. 
Thanks. 

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-27 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560874#comment-14560874
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

pinging again here [~asitang] and [~sujenshah] any updates here?

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-27 Thread Asitang Mishra (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561434#comment-14561434
 ] 

Asitang Mishra commented on NUTCH-2015:
---

Hi [~chrismattmann],

I think [~sujenshah] is already on this task and might even have a patch ready, 
as it's a small one. I am just waiting for him to update this (he will reach 
today). Let's hope he does this withing a couple of days or else I will do it. 
Meanwhile I am working on the crawl API and crawlDB API right now. 

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557467#comment-14557467
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

Ping Sujen, Asitang, did you guys get a chance to make updates here?

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551632#comment-14551632
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

Sujen can you also please update the wiki for Nutch to include more REST API 
docs?

https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551447#comment-14551447
 ] 

Chris A. Mattmann commented on NUTCH-2015:
--

[~sujenshah] did you make the changes? Can we commit the pull request?

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-17 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547239#comment-14547239
 ] 

Sujen Shah commented on NUTCH-2015:
---

Yes true, you are right. Will make the necessary changes. 
Thanks!

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-17 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547213#comment-14547213
 ] 

Sebastian Nagel commented on NUTCH-2015:


Ok. Ev. this could be changed to make it more clear: a check isRunning() inside 
a loop suggests that the state of the server can change between running and 
stopped. That's (currently) not the case and a boolean field variable seems to 
more verbose. In case the server could be stopped it should be:
{code}
if (NutchServer.getInstance().isRunning())
  this.fetchNode = new FetchNode();
else
  this.fetchNode = null;
{code}

Since currently fetchNode is only used with a parsing fetcher this could be 
also checked, e.g.
{code}
if (parsing  NutchServer.getInstance().isRunning()) {
  reportToNutchServer = true;
}
{code}

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546646#comment-14546646
 ] 

ASF GitHub Bot commented on NUTCH-2015:
---

GitHub user sujen1412 opened a pull request:

https://github.com/apache/nutch/pull/25

fix for NUTCH-2015 contributed by Sujen Shah



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujen1412/nutch NUTCH-2015

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #25


commit 5a6766587c8490e1c7b31eae54bce91b4411a3fa
Author: Sujen Shah sujen1...@gmail.com
Date:   2015-05-16T08:32:36Z

Creation of FetchNodes is off by default if NutchServer is not used




 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-16 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546648#comment-14546648
 ] 

Sujen Shah commented on NUTCH-2015:
---

PR link - https://github.com/apache/nutch/pull/25

 Make FetchNodeDb optional (off by default) if NutchServer is not used
 -

 Key: NUTCH-2015
 URL: https://issues.apache.org/jira/browse/NUTCH-2015
 Project: Nutch
  Issue Type: Sub-task
  Components: fetcher, REST_api
Reporter: Sujen Shah
  Labels: memex
 Fix For: 1.11


 Currently, the FetchNodes are created even if the NutchServer is not used 
 causing memory exceptions. This patch makes the fetcher report to the 
 FetchNodeDb only if the crawl is invoked from the REST service (ie 
 NutchServer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)