Jenkins build is back to normal : Nutch-trunk #3239

2015-08-02 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3239/



Re: GSOC2015- Sitemap crawler roudmap problems

2015-08-02 Thread Cihad Guzel
Hi

I am proceesing my work. My code is integreted nutch life cycle. Sitemap
files are can injeceted and parsed. You known, sitemap file have any tags
as lastmodified, priortiy and changefreq. Firstly, I put the tags value to
metadata. Then, I update last modified and fetch inteval field of webpage
as for the tags. But I didn't use priority tags. I want to calculate new
score using priority for list of urls from sitemap. While the urls of
sitemap have priority value, another webpage urls doesn't have the value.
There are disorder.  How do you think should be implemented it?

I attached the last code as patch on this email.


2015-07-11 12:10 GMT+03:00 Cihad Guzel cguz...@gmail.com:

 Hi Lewis.

 Thanks for your suggestions. I will be thinking about this.

 2015-07-10 3:47 GMT+03:00 Lewis John Mcgibbney lewis.mcgibb...@gmail.com
 :

 Hi Cihad,
 I'll take a look tonight.
 My understanding is that this would be implemented as part of core and
 not as a plugin. Within the plugin we can, at time, have acesss to less
 verbose data structures. This is of course not always the case, but
 generally speaking we see more issues, depending on which interfaces we
 extend, with appropriate access to the correct data structures. We then
 have the issue of dependency management.
 I'll have a look through the various links you have sent and then write
 back here in due course.
 Apologies about the delay.
 Thanks

 On Mon, Jul 6, 2015 at 12:20 AM, Cihad Guzel cguz...@gmail.com wrote:

 Hi,

 I have find a patch for my metadata problem [1]. But , the problem isn't
 solved for 2.x [2]. I guess, I need to solve it.

 [1] https://issues.apache.org/jira/browse/NUTCH-1622
 [2] https://issues.apache.org/jira/browse/NUTCH-1816

 2015-07-04 15:56 GMT+03:00 Cihad Guzel cguz...@gmail.com:

 Hi Lewis,

 I and Talat talk about architecture for sitemap supporting . We thought
 the problem could be solved in nutch life cycle . We don't want to build a
 different life cycle for sitemap crawling.

 So, I have some problems as following:

 If the sitemap file is too large size, it can not be fetched and
 parsed. It gets timeout. I solved timeout problem temporarily to parse by
 raising the value of timeout in nutch-site.xml and to fetch by working
 small size file. It is not good.

 Moreover, you know sitemap files have some special tags as loc,
 lastmod, changefreq or priority. It has been parsed using my parse
 plugin. I want to  record to crawldb, but the Parse  object doesn't
 support metadata or same fields. It has only outlink array. It isn't enough
 for recording metadata.

 I want to record each url in sitemap file with the metadata seperately.

 I viewed all patchs and comments from NUTCH-1465 and there are some
 solution for same problems in it. But, new job for sitemap crawling have
 been created.

 Could you show me a way out?

 Thanks.





 --
 *Lewis*



diff --git a/conf/gora-hbase-mapping.xml b/conf/gora-hbase-mapping.xml
index eb58819..5bd011b 100644
--- a/conf/gora-hbase-mapping.xml
+++ b/conf/gora-hbase-mapping.xml
@@ -46,6 +46,7 @@ http://gora.apache.org/current/gora-hbase.html
 family name=s maxVersions=1/
 family name=il maxVersions=1/
 family name=ol maxVersions=1/
+family name=stm maxVersions=1/
 family name=h maxVersions=1/
 family name=mtdt maxVersions=1/
 family name=mk maxVersions=1/
@@ -66,6 +67,8 @@ http://gora.apache.org/current/gora-hbase.html
 field name=modifiedTime family=f qualifier=mod/
 field name=prevModifiedTime family=f qualifier=pmod/
 field name=batchId family=f qualifier=bid/
+ 	field name=sitemaps family=stm/
+ 
 
 !-- parse fields   --
 field name=title family=p qualifier=t/
@@ -76,6 +79,8 @@ http://gora.apache.org/current/gora-hbase.html
 
 !-- score fields   --
 field name=score family=s qualifier=s/
+field name=stmPriority family=s qualifier=sp/
+
 field name=headers family=h/
 field name=inlinks family=il/
 field name=outlinks family=ol/
diff --git a/conf/parse-plugins.xml b/conf/parse-plugins.xml
index 5b20be6..0551381 100644
--- a/conf/parse-plugins.xml
+++ b/conf/parse-plugins.xml
@@ -68,6 +68,7 @@
 		plugin id=feed /
 	/mimeType
 
+
!-- Types for parse-ext plugin: required for unit tests to pass. --
 
 	mimeType name=application/vnd.nutch.example.cat
diff --git a/src/gora/webpage.avsc b/src/gora/webpage.avsc
index dce0050..0761c08 100644
--- a/src/gora/webpage.avsc
+++ b/src/gora/webpage.avsc
@@ -278,6 +278,26 @@
   ],
   doc: A batchId that this WebPage is assigned to. WebPage's are fetched in batches, called fetchlists. Pages are partitioned but can always be associated and fetched alongside pages of similar value (within a crawl cycle) based on batchId.,
   default: null
+},
+{
+  name: sitemaps,
+  type: {
+

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651127#comment-14651127
 ] 

Chris A. Mattmann commented on NUTCH-2059:
--

ping thoughts here? Doesn't seem to be a broken build in a while but maybe we 
should push your updates regardless Peter?

 protocol-httpclient, protocol-http unit test errors on Jenkins
 --

 Key: NUTCH-2059
 URL: https://issues.apache.org/jira/browse/NUTCH-2059
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Reporter: Peter Ciuffetti
Assignee: Chris A. Mattmann
 Fix For: 1.11


 This is an occasional error on the build of the Nutch trunk visible in 
 Jenkins builds.  It happens on either protocol-http or protocol-httpclient, 
 which can be running at the same time given the multi-threaded test setup.
 {code}
 [junit] Running org.apache.nutch.protocol.httpclient.TestProtocolHttpClient
 [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.377 
 sec
 [junit] Test org.apache.nutch.protocol.http.TestProtocolHttp FAILED
 {code}
 Evidence of failure on Jenkins go back to
 Failed  Console Output  #3154Jun 8, 2015 4:00:00 AM
 https://builds.apache.org/view/All/job/Nutch-trunk/3154/consoleFull
 And are repeated at...
 https://builds.apache.org/view/All/job/Nutch-trunk/3190/console
 https://builds.apache.org/view/All/job/Nutch-trunk/3189/console
 Some possibly related tickets
 NUTCH-1836 Timeouts in protocol-httpclient when crawling same host with 2 
 threads 
 NUTCH-1086 Rewrite protocol-httpclient
 The unit tests are not failing for me on my sandbox, but there are some 
 exceptions being output to the log related to headers being sent on JSP pages 
 after the response writer is invoked.
 {code}
 java.lang.IllegalStateException: STREAM
 at org.mortbay.jetty.Response.getWriter(Response.java:616)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651264#comment-14651264
 ] 

Chris A. Mattmann commented on NUTCH-2062:
--

{noformat}
test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/mattmann/tmp/nutch-trunk/build/test/lib/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.79 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.929 sec

BUILD SUCCESSFUL
Total time: 12 minutes 11 seconds
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}

All tests passing, commiting this now.


 Add Plugin for interacting with Selenium WebDriver
 --

 Key: NUTCH-2062
 URL: https://issues.apache.org/jira/browse/NUTCH-2062
 Project: Nutch
  Issue Type: Improvement
  Components: plugin
Affects Versions: 1.10
Reporter: Michael Joyce
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11

 Attachments: NUTCH-2062v2.patch


 The protocol-selenium plugin is great for pulling webpages that dynamically 
 load content. However, I've run into use cases where I need to actively 
 interact with a page in Selenium before it becomes useful. For instance, I 
 may need to paginate through a table to get all results that I'm interested 
 in. This plugin will handle that use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: NUTCH-2062 - Interactive Selenium Plugin

2015-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/46


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651266#comment-14651266
 ] 

Chris A. Mattmann commented on NUTCH-2062:
--

Thanks [~mjoyce]! All committed:

{noformat}
[chipotle:~/tmp/nutch-trunk] mattmann% svn commit -m Fix for NUTCH-2062: Add 
Plugin for interacting with Selenium WebDriver contributed by Michael Joyce 
mltjo...@gmail.com this closes #46
Sendingbuild.xml
Sendingconf/nutch-default.xml
Sendingsrc/plugin/build.xml
Sending
src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java
Adding src/plugin/protocol-interactiveselenium
Adding src/plugin/protocol-interactiveselenium/README.md
Adding src/plugin/protocol-interactiveselenium/build-ivy.xml
Adding src/plugin/protocol-interactiveselenium/build.xml
Adding src/plugin/protocol-interactiveselenium/ivy.xml
Adding src/plugin/protocol-interactiveselenium/plugin.xml
Adding src/plugin/protocol-interactiveselenium/src
Adding src/plugin/protocol-interactiveselenium/src/java
Adding src/plugin/protocol-interactiveselenium/src/java/org
Adding src/plugin/protocol-interactiveselenium/src/java/org/apache
Adding src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/Http.java
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultHandler.java
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/InteractiveSeleniumHandler.java
Adding 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/package.html
Transmitting file data ..
Committed revision 1693837.
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}


 Add Plugin for interacting with Selenium WebDriver
 --

 Key: NUTCH-2062
 URL: https://issues.apache.org/jira/browse/NUTCH-2062
 Project: Nutch
  Issue Type: Improvement
  Components: plugin
Affects Versions: 1.10
Reporter: Michael Joyce
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11

 Attachments: NUTCH-2062v2.patch


 The protocol-selenium plugin is great for pulling webpages that dynamically 
 load content. However, I've run into use cases where I need to actively 
 interact with a page in Selenium before it becomes useful. For instance, I 
 may need to paginate through a table to get all results that I'm interested 
 in. This plugin will handle that use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann reassigned NUTCH-2072:


Assignee: Chris A. Mattmann

 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor

 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651265#comment-14651265
 ] 

ASF GitHub Bot commented on NUTCH-2062:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/46


 Add Plugin for interacting with Selenium WebDriver
 --

 Key: NUTCH-2062
 URL: https://issues.apache.org/jira/browse/NUTCH-2062
 Project: Nutch
  Issue Type: Improvement
  Components: plugin
Affects Versions: 1.10
Reporter: Michael Joyce
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11

 Attachments: NUTCH-2062v2.patch


 The protocol-selenium plugin is great for pulling webpages that dynamically 
 load content. However, I've run into use cases where I need to actively 
 interact with a page in Selenium before it becomes useful. For instance, I 
 may need to paginate through a table to get all results that I'm interested 
 in. This plugin will handle that use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2072 started by Chris A. Mattmann.

 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2072:
-
Fix Version/s: 1.11

 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2066:
-
Description: Allow user to specify crawldb and segment db in the Generate 
Job REST endpoint 

 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11


 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2066) Allow user to specify crawldb and segment db in the Generate JOb REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2066 started by Chris A. Mattmann.

 Allow user to specify crawldb and segment db in the Generate JOb REST 
 endpoint 
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651295#comment-14651295
 ] 

Hudson commented on NUTCH-2062:
---

SUCCESS: Integrated in Nutch-trunk #3237 (See 
[https://builds.apache.org/job/Nutch-trunk/3237/])
Changes for NUTCH-2062 (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1693838)
* /nutch/trunk/CHANGES.txt
Fix for NUTCH-2062: Add Plugin for interacting with Selenium WebDriver 
contributed by Michael Joyce mltjo...@gmail.com this closes #46 (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1693837)
* /nutch/trunk/build.xml
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/plugin/build.xml
* 
/nutch/trunk/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java
* /nutch/trunk/src/plugin/protocol-interactiveselenium
* /nutch/trunk/src/plugin/protocol-interactiveselenium/README.md
* /nutch/trunk/src/plugin/protocol-interactiveselenium/build-ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/build.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/ivy.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/plugin.xml
* /nutch/trunk/src/plugin/protocol-interactiveselenium/src
* /nutch/trunk/src/plugin/protocol-interactiveselenium/src/java
* /nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org
* /nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache
* /nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/Http.java
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultHandler.java
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/InteractiveSeleniumHandler.java
* 
/nutch/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/package.html


 Add Plugin for interacting with Selenium WebDriver
 --

 Key: NUTCH-2062
 URL: https://issues.apache.org/jira/browse/NUTCH-2062
 Project: Nutch
  Issue Type: Improvement
  Components: plugin
Affects Versions: 1.10
Reporter: Michael Joyce
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11

 Attachments: NUTCH-2062v2.patch


 The protocol-selenium plugin is great for pulling webpages that dynamically 
 load content. However, I've run into use cases where I need to actively 
 interact with a page in Selenium before it becomes useful. For instance, I 
 may need to paginate through a table to get all results that I'm interested 
 in. This plugin will handle that use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651294#comment-14651294
 ] 

Hudson commented on NUTCH-2072:
---

SUCCESS: Integrated in Nutch-trunk #3237 (See 
[https://builds.apache.org/job/Nutch-trunk/3237/])
Fix for NUTCH-2072: Deflate encoding support is broken when http.content.limit 
is set to -1 contributed by Tanguy Moal tan...@cogniteev.com this closes #48. 
(mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1693843)
* /nutch/trunk/CHANGES.txt
* 
/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java


 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651315#comment-14651315
 ] 

Chris A. Mattmann commented on NUTCH-2059:
--

we have a failed build - 
https://builds.apache.org/job/Nutch-trunk/3238/testReport/junit/org.apache.nutch.fetcher/TestFetcher/testFetch/
 related?

 protocol-httpclient, protocol-http unit test errors on Jenkins
 --

 Key: NUTCH-2059
 URL: https://issues.apache.org/jira/browse/NUTCH-2059
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Reporter: Peter Ciuffetti
Assignee: Chris A. Mattmann
 Fix For: 1.11


 This is an occasional error on the build of the Nutch trunk visible in 
 Jenkins builds.  It happens on either protocol-http or protocol-httpclient, 
 which can be running at the same time given the multi-threaded test setup.
 {code}
 [junit] Running org.apache.nutch.protocol.httpclient.TestProtocolHttpClient
 [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.377 
 sec
 [junit] Test org.apache.nutch.protocol.http.TestProtocolHttp FAILED
 {code}
 Evidence of failure on Jenkins go back to
 Failed  Console Output  #3154Jun 8, 2015 4:00:00 AM
 https://builds.apache.org/view/All/job/Nutch-trunk/3154/consoleFull
 And are repeated at...
 https://builds.apache.org/view/All/job/Nutch-trunk/3190/console
 https://builds.apache.org/view/All/job/Nutch-trunk/3189/console
 Some possibly related tickets
 NUTCH-1836 Timeouts in protocol-httpclient when crawling same host with 2 
 threads 
 NUTCH-1086 Rewrite protocol-httpclient
 The unit tests are not failing for me on my sandbox, but there are some 
 exceptions being output to the log related to headers being sent on JSP pages 
 after the response writer is invoked.
 {code}
 java.lang.IllegalStateException: STREAM
 at org.mortbay.jetty.Response.getWriter(Response.java:616)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2066:
-
Summary: Parameterize Generate REST endpoint  (was: Allow user to specify 
crawldb and segment db in the Generate Job REST endpoint )

 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2066) Allow user to specify crawldb and segment db in the Generate Job REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2066:
-
Summary: Allow user to specify crawldb and segment db in the Generate Job 
REST endpoint   (was: Allow user to specify crawldb and segment db in the 
Generate JOb REST endpoint )

 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-2062.
--
Resolution: Fixed

Committed, thanks Mike!

 Add Plugin for interacting with Selenium WebDriver
 --

 Key: NUTCH-2062
 URL: https://issues.apache.org/jira/browse/NUTCH-2062
 Project: Nutch
  Issue Type: Improvement
  Components: plugin
Affects Versions: 1.10
Reporter: Michael Joyce
Assignee: Chris A. Mattmann
  Labels: memex
 Fix For: 1.11

 Attachments: NUTCH-2062v2.patch


 The protocol-selenium plugin is great for pulling webpages that dynamically 
 load content. However, I've run into use cases where I need to actively 
 interact with a page in Selenium before it becomes useful. For instance, I 
 may need to paginate through a table to get all results that I'm interested 
 in. This plugin will handle that use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651279#comment-14651279
 ] 

Chris A. Mattmann edited comment on NUTCH-2072 at 8/2/15 11:39 PM:
---

Fixed, thanks [~tanguy]!

{noformat}
[chipotle:~/tmp/nutch-trunk] mattmann% svn commit -m Fix for NUTCH-2072: 
Deflate encoding support is broken when http.content.limit is set to -1 
contributed by Tanguy Moal tan...@cogniteev.com this closes #48.
SendingCHANGES.txt
Sending
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
Transmitting file data ..
Committed revision 1693843.
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}



was (Author: chrismattmann):
Fixed, thanks [~ltanguy]

{noformat}
[chipotle:~/tmp/nutch-trunk] mattmann% svn commit -m Fix for NUTCH-2072: 
Deflate encoding support is broken when http.content.limit is set to -1 
contributed by Tanguy Moal tan...@cogniteev.com this closes #48.
SendingCHANGES.txt
Sending
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
Transmitting file data ..
Committed revision 1693843.
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}


 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2066) Allow user to specify crawldb and segment db in the Generate Job REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-2066:
-
Labels: memex  (was: )

 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-2072.
--
Resolution: Fixed

Fixed, thanks [~ltanguy]

{noformat}
[chipotle:~/tmp/nutch-trunk] mattmann% svn commit -m Fix for NUTCH-2072: 
Deflate encoding support is broken when http.content.limit is set to -1 
contributed by Tanguy Moal tan...@cogniteev.com this closes #48.
SendingCHANGES.txt
Sending
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
Transmitting file data ..
Committed revision 1693843.
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}


 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: Fix for NUTCH-2066 contributed by Sujen Shah

2015-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/47


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651297#comment-14651297
 ] 

Chris A. Mattmann commented on NUTCH-2066:
--

All tests pass:

{noformat}

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/mattmann/tmp/nutch-trunk/build/test/lib/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.886 sec
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.552 sec

BUILD SUCCESSFUL
Total time: 11 minutes 11 seconds
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}


 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11


 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-trunk #3238

2015-08-02 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/3238/changes

Changes:

[mattmann] Fix for NUTCH-2066: Parameterize Generate REST endpoint contributed 
by Sujen Shah sujen1...@gmail.com this closes #47.

--
[...truncated 4272 lines...]
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-host

copy-generated-lib:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-host

init:
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test/lib
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlnormalizer-pass
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
[javac] Creating empty 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes/org/apache/nutch/net/urlnormalizer/pass/package-info.class

jar:
  [jar] Building jar: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar

deps-test:

deploy:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass

copy-generated-lib:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass

init:
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/classes
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/test
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/test/lib
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-querystring

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlnormalizer-querystring
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/classes
[javac] Creating empty 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/classes/org/apache/nutch/net/urlnormalizer/querystring/package-info.class

jar:
  [jar] Building jar: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-querystring/urlnormalizer-querystring.jar

deps-test:

deploy:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-querystring

copy-generated-lib:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-querystring
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data
 [copy] Copying 4 files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/data

init:
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test/lib
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-regex

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlnormalizer-regex
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes
[javac] Creating empty 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/classes/org/apache/nutch/net/urlnormalizer/regex/package-info.class

jar:
  [jar] Building jar: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/urlnormalizer-regex.jar

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:


[jira] [Commented] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651314#comment-14651314
 ] 

Hudson commented on NUTCH-2066:
---

FAILURE: Integrated in Nutch-trunk #3238 (See 
[https://builds.apache.org/job/Nutch-trunk/3238/])
Fix for NUTCH-2066: Parameterize Generate REST endpoint contributed by Sujen 
Shah sujen1...@gmail.com this closes #47. (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1693844)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java


 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11


 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651277#comment-14651277
 ] 

Chris A. Mattmann commented on NUTCH-2072:
--

Tests pass:

{noformat}

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/mattmann/tmp/nutch-trunk/build/test/lib/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.055 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.856 sec
[junit] Running org.apache.nutch.tika.TestRTFParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
0.125 sec
[junit] Running org.apache.nutch.tika.TestRobotsMetaProcessor
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
17.994 sec

BUILD SUCCESSFUL
Total time: 13 minutes 21 seconds
{noformat}

Committing this now. Thanks.


 Deflate encoding support is broken when http.content.limit is set to -1
 ---

 Key: NUTCH-2072
 URL: https://issues.apache.org/jira/browse/NUTCH-2072
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Tanguy Moal
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.11


 The method {{DeflateUtils.inflateBestEffort(byte[] in, int sizeLimit)}} is 
 not designed to have sizeLimit set to a negative value.
 The fix can be simply to mimic what's done with gzip encoding : if 
 {{getMaxContent()  0}} then use {{Integer.MAX_VALUE}} for the {{sizeLimit}} 
 argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651300#comment-14651300
 ] 

ASF GitHub Bot commented on NUTCH-2066:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/47


 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11


 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2066) Parameterize Generate REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-2066.
--
Resolution: Fixed

Committed to trunk:

{noformat}
[chipotle:~/tmp/nutch-trunk] mattmann% svn commit -m Fix for NUTCH-2066: 
Parameterize Generate REST endpoint contributed by Sujen Shah 
sujen1...@gmail.com this closes #47.
SendingCHANGES.txt
Sendingsrc/java/org/apache/nutch/crawl/Generator.java
Transmitting file data ..
Committed revision 1693844.
[chipotle:~/tmp/nutch-trunk] mattmann% 
{noformat}


 Parameterize Generate REST endpoint
 ---

 Key: NUTCH-2066
 URL: https://issues.apache.org/jira/browse/NUTCH-2066
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api
Reporter: Sujen Shah
Assignee: Chris A. Mattmann
Priority: Minor
  Labels: memex
 Fix For: 1.11


 Allow user to specify crawldb and segment db in the Generate Job REST 
 endpoint 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)