[jira] [Commented] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton

2019-01-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754860#comment-16754860
 ] 

Hudson commented on NUTCH-2689:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3608 (See 
[https://builds.apache.org/job/Nutch-trunk/3608/])
NUTCH-2689 Speed up urlfilter-regex and urlfilter-automaton - do not (snagel: 
[https://github.com/apache/nutch/commit/f87b19b0ee8a01c5f54f5ed4b6b159169705682f])
* (edit) conf/regex-urlfilter.txt.template
* (edit) 
src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexURLFilterBase.java
* (edit) src/plugin/urlfilter-regex/sample/Benchmarks.rules
* (edit) src/plugin/urlfilter-regex/sample/WholeWebCrawling.rules
* (edit) src/plugin/urlfilter-regex/sample/IntranetCrawling.rules


> Speed up urlfilter-regex and urlfilter-automaton
> 
>
> Key: NUTCH-2689
> URL: https://issues.apache.org/jira/browse/NUTCH-2689
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> The unit tests of urlfilter-regex and urlfilter-automaton include a 
> benchmark. After playing and benchmarking modifications the following changes 
> seem to significantly improve the performance:
> - do not extract host and domain name from the URL if not needed (no 
> host/domain-specific rules used, cf. NUTCH-1838)
> - use non-capturing groups if possible
> - use {{(?i)}} to make the patterns case insensitive and remove uppercase 
> variants 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2691) Improve logging from scoring-depth plugin

2019-01-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754861#comment-16754861
 ] 

Hudson commented on NUTCH-2691:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3608 (See 
[https://builds.apache.org/job/Nutch-trunk/3608/])
NUTCH-2691: Improve logging from scoring-depth plugin (github: 
[https://github.com/apache/nutch/commit/010c2fc8035525545812ae8acfbeeda1a8bbb96b])
* (edit) 
src/plugin/scoring-depth/src/java/org/apache/nutch/scoring/depth/DepthScoringFilter.java


> Improve logging from scoring-depth plugin
> -
>
> Key: NUTCH-2691
> URL: https://issues.apache.org/jira/browse/NUTCH-2691
> Project: Nutch
>  Issue Type: Improvement
>  Components: scoring
>Affects Versions: 1.15
>Reporter: Yossi Tamari
>Priority: Minor
> Fix For: 1.16
>
>
> Currently the scoring-depth plugin emits a "Missing depth, removing all 
> outlinks from url" log message for every page that failed parsing (and does 
> not have outlinks anyway).
> Will provide a patch that exits immediately when there are no outlinks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Nutch-trunk #3608

2019-01-29 Thread Apache Jenkins Server
See 


Changes:

[snagel] NUTCH-2689 Speed up urlfilter-regex and urlfilter-automaton - do not

[github] NUTCH-2691: Improve logging from scoring-depth plugin

--
[...truncated 6.38 KB...]
[javac]  ^
[javac]   symbol:   class ForeignCollectionField
[javac]   location: package com.j256.ormlite.field
[javac] 
:29:
 error: cannot find symbol
[javac] import com.j256.ormlite.field.DatabaseField;
[javac]  ^
[javac]   symbol:   class DatabaseField
[javac]   location: package com.j256.ormlite.field
[javac] 
:28:
 error: cannot find symbol
[javac] import com.j256.ormlite.field.DatabaseField;
[javac]  ^
[javac]   symbol:   class DatabaseField
[javac]   location: package com.j256.ormlite.field
[javac] 
:24:
 error: cannot find symbol
[javac] import com.j256.ormlite.dao.Dao;
[javac]^
[javac]   symbol:   class Dao
[javac]   location: package com.j256.ormlite.dao
[javac] 
:26:
 error: cannot find symbol
[javac] import com.j256.ormlite.support.ConnectionSource;
[javac]^
[javac]   symbol:   class ConnectionSource
[javac]   location: package com.j256.ormlite.support
[javac] 
:29:
 error: cannot find symbol
[javac]   private ConnectionSource connectionSource;
[javac]   ^
[javac]   symbol:   class ConnectionSource
[javac]   location: class CustomDaoFactory
[javac] 
:30:
 error: cannot find symbol
[javac]   private List> registredDaos = Collections
[javac]^
[javac]   symbol:   class Dao
[javac]   location: class CustomDaoFactory
[javac] 
:33:
 error: cannot find symbol
[javac]   public CustomDaoFactory(ConnectionSource connectionSource) {
[javac]   ^
[javac]   symbol:   class ConnectionSource
[javac]   location: class CustomDaoFactory
[javac] 
:37:
 error: cannot find symbol
[javac]   public  Dao createDao(Class clazz) {
[javac]  ^
[javac]   symbol:   class Dao
[javac]   location: class CustomDaoFactory
[javac] 
:47:
 error: cannot find symbol
[javac]   private  void register(Dao dao) {
[javac] ^
[javac]   symbol:   class Dao
[javac]   location: class CustomDaoFactory
[javac] 
:53:
 error: cannot find symbol
[javac]   public List> getCreatedDaos() {
[javac]   ^
[javac]   symbol:   class Dao
[javac]   location: class CustomDaoFactory
[javac] 
:22:
 error: cannot find symbol
[javac] import com.j256.ormlite.dao.BaseDaoImpl;
[javac]^
[javac]   symbol:   class BaseDaoImpl
[javac]   location: package com.j256.ormlite.dao
[javac] 
:23:
 error: cannot find symbol
[javac] import com.j256.ormlite.dao.Dao;
[javac]^
[javac]   symbol:   class Dao
[javac]   location: package com.j256.ormlite.dao
[javac] 
:25:
 error: cannot find symbol
[javac] import com.j256.ormlite.table.DatabaseTableConfig;
[javac]  ^
[javac]   symbol:   class DatabaseTableConfig
[javac]   location: package com.j256.ormlite.table
[javac] 
:26:
 error: cannot find symbol
[javac] import 

[jira] [Resolved] (NUTCH-2689) Speed up urlfilter-regex and urlfilter-automaton

2019-01-29 Thread Sebastian Nagel (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2689.

Resolution: Implemented

Thanks, [~markus17]! Merged.

> Speed up urlfilter-regex and urlfilter-automaton
> 
>
> Key: NUTCH-2689
> URL: https://issues.apache.org/jira/browse/NUTCH-2689
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> The unit tests of urlfilter-regex and urlfilter-automaton include a 
> benchmark. After playing and benchmarking modifications the following changes 
> seem to significantly improve the performance:
> - do not extract host and domain name from the URL if not needed (no 
> host/domain-specific rules used, cf. NUTCH-1838)
> - use non-capturing groups if possible
> - use {{(?i)}} to make the patterns case insensitive and remove uppercase 
> variants 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2691) Improve logging from scoring-depth plugin

2019-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754824#comment-16754824
 ] 

ASF GitHub Bot commented on NUTCH-2691:
---

sebastian-nagel commented on pull request #434: NUTCH-2691: Improve logging 
from scoring-depth plugin
URL: https://github.com/apache/nutch/pull/434
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve logging from scoring-depth plugin
> -
>
> Key: NUTCH-2691
> URL: https://issues.apache.org/jira/browse/NUTCH-2691
> Project: Nutch
>  Issue Type: Improvement
>  Components: scoring
>Affects Versions: 1.15
>Reporter: Yossi Tamari
>Priority: Minor
> Fix For: 1.16
>
>
> Currently the scoring-depth plugin emits a "Missing depth, removing all 
> outlinks from url" log message for every page that failed parsing (and does 
> not have outlinks anyway).
> Will provide a patch that exits immediately when there are no outlinks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (NUTCH-2691) Improve logging from scoring-depth plugin

2019-01-29 Thread Sebastian Nagel (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2691.

Resolution: Implemented

Merged. Thanks, [~yossi]!

> Improve logging from scoring-depth plugin
> -
>
> Key: NUTCH-2691
> URL: https://issues.apache.org/jira/browse/NUTCH-2691
> Project: Nutch
>  Issue Type: Improvement
>  Components: scoring
>Affects Versions: 1.15
>Reporter: Yossi Tamari
>Priority: Minor
> Fix For: 1.16
>
>
> Currently the scoring-depth plugin emits a "Missing depth, removing all 
> outlinks from url" log message for every page that failed parsing (and does 
> not have outlinks anyway).
> Will provide a patch that exits immediately when there are no outlinks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)