[jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942045#comment-16942045
 ] 

Hudson commented on NUTCH-2737:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3649 (See 
[https://builds.apache.org/job/Nutch-trunk/3649/])
NUTCH-2737 Generator: count and log reason of rejections during (snagel: 
[https://github.com/apache/nutch/commit/f02c98ed5d1b9d6a08f6ba95b3203bc6465e8a2e])
* (edit) src/java/org/apache/nutch/crawl/Generator.java
NUTCH-2737 Generator: count and log reason of rejections during (snagel: 
[https://github.com/apache/nutch/commit/35da06fe37b78e7b865198a3673d82483cc46496])
* (edit) src/java/org/apache/nutch/crawl/Generator.java


> Generator: count and log reason of rejections during selection
> --
>
> Key: NUTCH-2737
> URL: https://issues.apache.org/jira/browse/NUTCH-2737
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> During the map phase of the selection step, the generator rejects many 
> (usually most of) items for various reasons:
> - not yet time for a refetch (returned by the fetch scheduler)
> - generator score too low
> - status does not match restrict status
> - Jexl expression not matched
> and some more. It would be useful if the reasons are counted and logged, esp. 
> when the CrawlDb gets bigger and multiple options to restrict the selection 
> are used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941887#comment-16941887
 ] 

ASF GitHub Bot commented on NUTCH-2737:
---

sebastian-nagel commented on pull request #477: NUTCH-2737 NUTCH-2738 
Generator: count and log reason of rejections during selection, document 
property generate.restrict.status
URL: https://github.com/apache/nutch/pull/477
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Generator: count and log reason of rejections during selection
> --
>
> Key: NUTCH-2737
> URL: https://issues.apache.org/jira/browse/NUTCH-2737
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> During the map phase of the selection step, the generator rejects many 
> (usually most of) items for various reasons:
> - not yet time for a refetch (returned by the fetch scheduler)
> - generator score too low
> - status does not match restrict status
> - Jexl expression not matched
> and some more. It would be useful if the reasons are counted and logged, esp. 
> when the CrawlDb gets bigger and multiple options to restrict the selection 
> are used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2737) Generator: count and log reason of rejections during selection

2019-09-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940871#comment-16940871
 ] 

ASF GitHub Bot commented on NUTCH-2737:
---

sebastian-nagel commented on pull request #477: NUTCH-2737 NUTCH-2738 
Generator: count and log reason of rejections during selection, document 
property generate.restrict.status
URL: https://github.com/apache/nutch/pull/477
 
 
   Generator improvements to address NUTCH-2737 and NUTCH-2738:
   - add counters for skipped URLs in Selector mapper and reducer
   - document generate.restrict.status
   - parameterize log messages
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Generator: count and log reason of rejections during selection
> --
>
> Key: NUTCH-2737
> URL: https://issues.apache.org/jira/browse/NUTCH-2737
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.17
>
>
> During the map phase of the selection step, the generator rejects many 
> (usually most of) items for various reasons:
> - not yet time for a refetch (returned by the fetch scheduler)
> - generator score too low
> - status does not match restrict status
> - Jexl expression not matched
> and some more. It would be useful if the reasons are counted and logged, esp. 
> when the CrawlDb gets bigger and multiple options to restrict the selection 
> are used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)