[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-21 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated NUTCH-1284:
---

Attachment: NUTCH-1284-2.x.v1.patch

Hi Lewis,
Thanks for reminding about 2.x. I have attached the patch for 2.x 
(NUTCH-1284-2.x.v1.patch)

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Assignee: Tejas Patil
Priority: Trivial
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1284-2.x.v1.patch, NUTCH-1284.patch, 
 NUTCH-1284-trunk.v1.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-20 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated NUTCH-1284:
---

Attachment: NUTCH-1284-trunk.v1.patch

Hi Lewis,
If I recall correctly, we want the crawl delay for the url (and hence its 
queues' delay) to be logged with the urls' fetching begins. Right ?

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Assignee: Tejas Patil
Priority: Trivial
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1284.patch, NUTCH-1284-trunk.v1.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-12 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated NUTCH-1284:
---

Assignee: Tejas Patil

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Assignee: Tejas Patil
Priority: Trivial
 Fix For: 1.7

 Attachments: NUTCH-1284.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1284:


Fix Version/s: 2.2

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Assignee: Tejas Patil
Priority: Trivial
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1284.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-07 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1284:


Patch Info: Patch Available

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Priority: Trivial
 Fix For: 1.7

 Attachments: NUTCH-1284.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2012-12-22 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated NUTCH-1284:
---

Attachment: NUTCH-1284.patch

Patch for the fix

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Priority: Trivial
 Fix For: 1.7

 Attachments: NUTCH-1284.patch


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-1284:
-

Fix Version/s: (was: 1.5)
   (was: nutchgora)
   1.6

20120304-push-1.6

 Add site fetcher.max.crawl.delay as log output by default.
 --

 Key: NUTCH-1284
 URL: https://issues.apache.org/jira/browse/NUTCH-1284
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Lewis John McGibbney
Priority: Trivial
 Fix For: 1.6


 Currently, when manually scanning our log output we cannot infer which pages 
 are governed by a crawl delay between successive fetch attempts of any given 
 page within the site. The value should be made available as something like:
 {code}
 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching 
 http://nutch.apache.org/ (crawl.delay=XXXms)
 {code}
 This way we can easily and quickly determine whether the fetcher is having to 
 use this functionality or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira