[jira] [Created] (NUTCH-2673) EOFException protocol-http

2018-11-07 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2673: Summary: EOFException protocol-http Key: NUTCH-2673 URL: https://issues.apache.org/jira/browse/NUTCH-2673 Project: Nutch Issue Type: Bug Affects

[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662234#comment-16662234 ] Markus Jelsma commented on NUTCH-2665: -- On my machine it really fails with

[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661983#comment-16661983 ] Markus Jelsma commented on NUTCH-2665: -- Helloe [~axr], yes it compiles fine,

[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660625#comment-16660625 ] Markus Jelsma commented on NUTCH-2665: -- I'll commit this one later to

[jira] [Updated] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2665: - Attachment: NUTCH-2665.patch > Upgrade to Apache Tika 1.1

[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660525#comment-16660525 ] Markus Jelsma commented on NUTCH-2665: -- Updated patch defining the propert

[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660455#comment-16660455 ] Markus Jelsma commented on NUTCH-2665: -- Patch for 2.x! > Upgrade to Apac

[jira] [Updated] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2665: - Attachment: NUTCH-2665.patch > Upgrade to Apache Tika 1.1

[jira] [Created] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2665: Summary: Upgrade to Apache Tika 1.19.1 Key: NUTCH-2665 URL: https://issues.apache.org/jira/browse/NUTCH-2665 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2651) Upgrade to Tika 1.19.1 (from 1.18)

2018-10-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658018#comment-16658018 ] Markus Jelsma commented on NUTCH-2651: -- [~wastl-nagel] i can feel the sorrow. I

[jira] [Commented] (NUTCH-2651) Upgrade to Tika 1.19.1 (from 1.18)

2018-10-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655133#comment-16655133 ] Markus Jelsma commented on NUTCH-2651: -- +1 also thanks for finding the java

[jira] [Commented] (NUTCH-2625) ProtocolFactory.getProtocol(url) may create multiple plugin instances

2018-10-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650282#comment-16650282 ] Markus Jelsma commented on NUTCH-2625: -- Seems reasonable

[jira] [Commented] (NUTCH-2186) -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob

2018-10-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646261#comment-16646261 ] Markus Jelsma commented on NUTCH-2186: -- [~asm123] please open a new ti

[jira] [Commented] (NUTCH-2192) Get rid of oro

2018-10-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643927#comment-16643927 ] Markus Jelsma commented on NUTCH-2192: -- Nice! I completely forgot these anc

[jira] [Commented] (NUTCH-2648) Make configurable whether TLS/SSL certificates are checked by protocol plugins

2018-10-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642971#comment-16642971 ] Markus Jelsma commented on NUTCH-2648: -- I misread the patch regarding the o

[jira] [Commented] (NUTCH-2648) Make configurable whether TLS/SSL certificates are checked by protocol plugins

2018-10-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642463#comment-16642463 ] Markus Jelsma commented on NUTCH-2648: -- +1! Although i would suggest to mentio

[jira] [Resolved] (NUTCH-2647) Skip TLS certificate checks in protocol-http plugin

2018-09-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2647. -- Resolution: Fixed Committed To https://gitbox.apache.org/repos/asf/nutch.git 9d59538c

[jira] [Commented] (NUTCH-2647) Skip TLS certificate checks in protocol-http plugin

2018-09-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631599#comment-16631599 ] Markus Jelsma commented on NUTCH-2647: -- To confirm, protocol-httpclient als

[jira] [Commented] (NUTCH-2647) Skip TLS certificate checks in protocol-http plugin

2018-09-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631038#comment-16631038 ] Markus Jelsma commented on NUTCH-2647: -- Hello Sebastian, My own implementatio

[jira] [Commented] (NUTCH-2623) Fetcher to guarantee delay for same host/domain/ip independent of http/https protocol

2018-09-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631040#comment-16631040 ] Markus Jelsma commented on NUTCH-2623: -- +1! Thanks Sebastian! > Fet

[jira] [Updated] (NUTCH-2647) Skip TLS certificate checks in protocol-http

2018-09-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2647: - Summary: Skip TLS certificate checks in protocol-http (was: Support for dummy X509 trust

[jira] [Updated] (NUTCH-2647) Skip TLS certificate checks in protocol-http plugin

2018-09-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2647: - Summary: Skip TLS certificate checks in protocol-http plugin (was: Skip TLS certificate checks

***UNCHECKED*** [jira] [Commented] (NUTCH-2647) Support for dummy X509 trust manager

2018-09-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620448#comment-16620448 ] Markus Jelsma commented on NUTCH-2647: -- patch for 1.15 source > Support fo

[jira] [Updated] (NUTCH-2647) Support for dummy X509 trust manager

2018-09-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2647: - Attachment: NUTCH-2647.patch > Support for dummy X509 trust mana

[jira] [Created] (NUTCH-2647) Support for dummy X509 trust manager

2018-09-19 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2647: Summary: Support for dummy X509 trust manager Key: NUTCH-2647 URL: https://issues.apache.org/jira/browse/NUTCH-2647 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2623) Fetcher to guarantee delay for same host/domain/ip independent of http/https protocol

2018-09-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613322#comment-16613322 ] Markus Jelsma commented on NUTCH-2623: -- +1, however, i would not have expect

[jira] [Created] (NUTCH-2630) Fetcher to log skipped records by robots.txt

2018-08-01 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2630: Summary: Fetcher to log skipped records by robots.txt Key: NUTCH-2630 URL: https://issues.apache.org/jira/browse/NUTCH-2630 Project: Nutch Issue Type

RE: [VOTE] Release Apache Nutch 1.15 RC#1

2018-08-01 Thread Markus Jelsma
However, the test crawl ran/runs fine, in the background, no errors. But just now, watching the fetcher, i noticed the crawl delay is not always respected. The only configuration change i have is the http.agent.* directives to run. 2018-08-01 11:47:41,256 INFO  fetcher.FetcherThread - FetcherThr

RE: [VOTE] Release Apache Nutch 1.15 RC#1

2018-08-01 Thread Markus Jelsma
All tests pass, crawler run fine so far, +1 for 1.15! Regards, Markus -Original message- > From:Sebastian Nagel > Sent: Thursday 26th July 2018 17:05 > To: u...@nutch.apache.org > Cc: dev@nutch.apache.org > Subject: [VOTE] Release Apache Nutch 1.15 RC#1 > > Hi Folks, > > A first ca

[jira] [Comment Edited] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554232#comment-16554232 ] Markus Jelsma edited comment on NUTCH-2612 at 7/24/18 1:2

[jira] [Updated] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2612: - Attachment: NUTCH-2612.patch > Support for sitemap processing by hostn

[jira] [Commented] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554232#comment-16554232 ] Markus Jelsma commented on NUTCH-2612: -- Updated patch: * logging when a hostnam

[jira] [Commented] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532712#comment-16532712 ] Markus Jelsma commented on NUTCH-2612: -- New patch! > Support for

[jira] [Updated] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2612: - Attachment: NUTCH-2612.patch > Support for sitemap processing by hostn

[jira] [Commented] (NUTCH-2614) NPE in CrawlDbReader

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532699#comment-16532699 ] Markus Jelsma commented on NUTCH-2614: -- Yes! > NPE in CrawlD

[jira] [Commented] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532691#comment-16532691 ] Markus Jelsma commented on NUTCH-2612: -- Yes of course! Will upload new p

[jira] [Comment Edited] (NUTCH-2614) NPE in CrawlDbReader

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532446#comment-16532446 ] Markus Jelsma edited comment on NUTCH-2614 at 7/4/18 9:2

[jira] [Commented] (NUTCH-2614) NPE in CrawlDbReader

2018-07-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532446#comment-16532446 ] Markus Jelsma commented on NUTCH-2614: -- Really? In that case my patch for N

[jira] [Created] (NUTCH-2614) NPE in CrawlDbReader

2018-07-03 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2614: Summary: NPE in CrawlDbReader Key: NUTCH-2614 URL: https://issues.apache.org/jira/browse/NUTCH-2614 Project: Nutch Issue Type: Bug Components

[jira] [Updated] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-03 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2612: - Attachment: NUTCH-2612.patch > Support for sitemap processing by hostn

[jira] [Commented] (NUTCH-2612) Support for sitemap processing by hostname

2018-07-03 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531253#comment-16531253 ] Markus Jelsma commented on NUTCH-2612: -- Patch for master! > Support for

[jira] [Created] (NUTCH-2612) Support for sitemap processing by hostname

2018-06-26 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2612: Summary: Support for sitemap processing by hostname Key: NUTCH-2612 URL: https://issues.apache.org/jira/browse/NUTCH-2612 Project: Nutch Issue Type

[jira] [Commented] (NUTCH-2606) MIME detection is wrong for plain-text documents send as Content-Type "application/msword"

2018-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517997#comment-16517997 ] Markus Jelsma commented on NUTCH-2606: -- Ah, this is interesting. Nutch in

[jira] [Updated] (NUTCH-2597) NPE in updatehostdb

2018-06-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2597: - Description: I get an NPE on updatehostdb. I start with a clean crawlDB & hostDB. Afte

RE: Nutch 1.14 issues

2018-06-13 Thread Markus Jelsma
Ah, wrong thread. But it seems some things are not entirely right for 1.15 release just yet. Markus -Original message- > From:Markus Jelsma > Sent: Wednesday 13th June 2018 12:44 > To: dev@nutch.apache.org > Subject: RE: Nutch 1.14 issues > > Hi, > > I've got some tests failing her

RE: Nutch 1.14 issues

2018-06-13 Thread Markus Jelsma
Hi, I've got some tests failing here on a vanilla master check out. [junit] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.314 sec [junit] Test org.apache.nutch.net.TestURLNormalizers FAILED Jurian had protocol-http's test failing just now, but running ant test on my

[jira] [Commented] (NUTCH-2416) Fetcher to log thread ID

2018-06-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503049#comment-16503049 ] Markus Jelsma commented on NUTCH-2416: -- Thanks! > Fetcher to log th

[jira] [Closed] (NUTCH-2416) Fetcher to log thread ID

2018-06-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2416. > Fetcher to log thread ID > > > Key

[jira] [Created] (NUTCH-2585) NPE in TrieStringMatcher

2018-05-25 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2585: Summary: NPE in TrieStringMatcher Key: NUTCH-2585 URL: https://issues.apache.org/jira/browse/NUTCH-2585 Project: Nutch Issue Type: Bug Affects Versions

[jira] [Commented] (NUTCH-2573) Suspend crawling if robots.txt fails to fetch with 5xx status

2018-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454287#comment-16454287 ] Markus Jelsma commented on NUTCH-2573: -- Sounds like a good idea! > Suspend c

[jira] [Commented] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher

2018-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453826#comment-16453826 ] Markus Jelsma commented on NUTCH-1228: -- Wow, this is ancient! Thanks! >

[jira] [Commented] (NUTCH-2572) HostDb: updatehostdb does not set values

2018-04-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449722#comment-16449722 ] Markus Jelsma commented on NUTCH-2572: -- +1 > HostDb: updatehostdb does

[jira] [Commented] (NUTCH-2547) urlnormalizer-basic fails on special characters in path/query

2018-03-29 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419027#comment-16419027 ] Markus Jelsma commented on NUTCH-2547: -- Hello Sebastian, option two sounds

[jira] [Comment Edited] (NUTCH-2541) Arabic characters in the URL path are not properly escaped by the protocol-httpclient plugin

2018-03-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407981#comment-16407981 ] Markus Jelsma edited comment on NUTCH-2541 at 3/21/18 3:1

[jira] [Commented] (NUTCH-2541) Arabic characters in the URL path are not properly escaped by the protocol-httpclient plugin

2018-03-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407981#comment-16407981 ] Markus Jelsma commented on NUTCH-2541: -- This is probably not a 1.14 problem

[jira] [Resolved] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2018-03-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2411. -- Resolution: Fixed Committed for 1.15 bd70d2fe..9a77f437 master -> master > Index-me

[jira] [Commented] (NUTCH-2411) Index-metadata to support indexing multiple values for a field

2018-03-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391141#comment-16391141 ] Markus Jelsma commented on NUTCH-2411: -- Forgot the last time i threatened to co

[jira] [Commented] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2018-03-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391139#comment-16391139 ] Markus Jelsma commented on NUTCH-2525: -- Any comments on this one? Julien did

[jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2018-03-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2525: - Attachment: NUTCH-2525.patch > Metadata indexer cannot handle uppercase parse metad

[jira] [Created] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2018-03-07 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2525: Summary: Metadata indexer cannot handle uppercase parse metadata Key: NUTCH-2525 URL: https://issues.apache.org/jira/browse/NUTCH-2525 Project: Nutch Issue

RE: Nutch fails to compile...

2018-02-21 Thread Markus Jelsma
, 2018 at 1:37 AM, BlackIce <mailto:blackice...@gmail.com>> wrote: > I commented out the date and now after a whole lot of warnings it says Build > Successful > > Im gonna take it for a short spin before I set up solr... > > > > On Wed, Feb 21, 2018 at 12:01

RE: Nutch fails to compile...

2018-02-20 Thread Markus Jelsma
Hello, Well, this is interesting! Have you tried Java 8 instead? I don´t think 9 should cause these kinds of problems but i haven't tried it yet, but would like to know anyway. Regarding commenting out the date, try it anyway! Regards, Markus -Original message- > From:BlackIce > Sent

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347762#comment-16347762 ] Markus Jelsma commented on NUTCH-2466: -- Another note, curious to see bro

[jira] [Comment Edited] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347762#comment-16347762 ] Markus Jelsma edited comment on NUTCH-2466 at 1/31/18 11:1

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347749#comment-16347749 ] Markus Jelsma commented on NUTCH-2466: -- Glad to hear this will work for

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347735#comment-16347735 ] Markus Jelsma commented on NUTCH-2466: -- Hello Moreno, Well, we obviously c

[jira] [Resolved] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2466. -- Resolution: Fixed > Sitemap processor to follow redire

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346862#comment-16346862 ] Markus Jelsma commented on NUTCH-2466: -- Thanks! remote: Sending notification em

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346768#comment-16346768 ] Markus Jelsma commented on NUTCH-2466: -- New patch! > Sitemap processor to

[jira] [Updated] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2466: - Attachment: NUTCH-2466.patch > Sitemap processor to follow redire

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-31 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346730#comment-16346730 ] Markus Jelsma commented on NUTCH-2466: -- Will commit shortly unless object

[jira] [Commented] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2018-01-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338290#comment-16338290 ] Markus Jelsma commented on NUTCH-2369: -- How is this different from the cur

[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin

2018-01-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335984#comment-16335984 ] Markus Jelsma commented on NUTCH-2503: -- Hmm, in the past you could run ant -f

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335949#comment-16335949 ] Markus Jelsma commented on NUTCH-2466: -- First patch adding maxRedir configurable

[jira] [Updated] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2466: - Attachment: NUTCH-2466.patch > Sitemap processor to follow redire

[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects

2018-01-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328892#comment-16328892 ] Markus Jelsma commented on NUTCH-2466: -- Ah, crap yeah. Won't get back to t

[jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326999#comment-16326999 ] Markus Jelsma commented on NUTCH-2496: -- Yes it makes a lot of sense to disabl

[jira] [Comment Edited] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326999#comment-16326999 ] Markus Jelsma edited comment on NUTCH-2496 at 1/16/18 10:5

[jira] [Commented] (NUTCH-2496) Speed up link inversion step in crawling script

2018-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325053#comment-16325053 ] Markus Jelsma commented on NUTCH-2496: -- If you use the same filters/normali

[jira] [Commented] (NUTCH-2487) Fetcher thread stopped due to constraint violation

2017-12-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303688#comment-16303688 ] Markus Jelsma commented on NUTCH-2487: -- It seems Nutch and your plugin are usi

[jira] [Commented] (NUTCH-2487) Fetcher thread stopped due to constraint violation

2017-12-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303691#comment-16303691 ] Markus Jelsma commented on NUTCH-2487: -- Ah, thanks. You already close

RE: [VOTE] Release Apache Nutch 1.14 RC#1

2017-12-19 Thread Markus Jelsma
open issues in the release notes, it's normal > to have open/unresolved issues before a release and we should focus only on > mentioning what was added/fixed, for the remaining issues we already have > Jira (which is public). >   > As for the release, +1. >   > Thanks Sebas

RE: [VOTE] Release Apache Nutch 1.14 RC#1

2017-12-19 Thread Markus Jelsma
I do not agree on mentioning those issues as unresolved in the release notes. They are known open issues, just as many others are known and open issues. There is no reason to mention these specific issues and not mentioning all the other open issues. Otherwise +1; Thanks Sebastian! -O

[jira] [Updated] (NUTCH-2485) ParserFactory swallows exception

2017-12-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2485: - Attachment: NUTCH-2485.patch Patch! > ParserFactory swallows except

[jira] [Created] (NUTCH-2485) ParserFactory swallows exception

2017-12-18 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2485: Summary: ParserFactory swallows exception Key: NUTCH-2485 URL: https://issues.apache.org/jira/browse/NUTCH-2485 Project: Nutch Issue Type: Bug Affects

[jira] [Closed] (NUTCH-2322) URL not available for Jexl operations

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2322. Thanks! > URL not available for Jexl operati

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294150#comment-16294150 ] Markus Jelsma commented on NUTCH-2478: -- Thanks! > // is not a valid b

[jira] [Resolved] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2320. -- Resolution: Duplicate > URLFilterChecker to run as TCP Telnet serv

[jira] [Closed] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2338. > URLNormalizerChecker to run as TCP Telnet serv

[jira] [Closed] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2320. Yes! > URLFilterChecker to run as TCP Telnet serv

[jira] [Closed] (NUTCH-2478) // is not a valid base URL

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2478. > // is not a valid base URL > -- > > Key

[jira] [Resolved] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2338. -- Resolution: Duplicate > URLNormalizerChecker to run as TCP Telnet serv

[jira] [Commented] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2017-12-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294148#comment-16294148 ] Markus Jelsma commented on NUTCH-2338: -- Yes! > URLNormalizerChecker to run

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.17

2017-12-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292469#comment-16292469 ] Markus Jelsma commented on NUTCH-2439: -- Weird, i only got : Dec 15, 2017 1:45:4

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.17

2017-12-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292421#comment-16292421 ] Markus Jelsma commented on NUTCH-2439: -- Note, since 1.17, all but one of

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292419#comment-16292419 ] Markus Jelsma commented on NUTCH-2478: -- I prefer your patch, it also carries a

[jira] [Commented] (NUTCH-2354) Upgrade Hadoop dependencies to 2.7.3

2017-12-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290955#comment-16290955 ] Markus Jelsma commented on NUTCH-2354: -- Yes, i think we should include

[jira] [Commented] (NUTCH-2474) CrawlDbReader -stats fails with ClassCastException

2017-12-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290957#comment-16290957 ] Markus Jelsma commented on NUTCH-2474: -- +1 > CrawlDbReader -stats fai

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288300#comment-16288300 ] Markus Jelsma commented on NUTCH-2478: -- To clarify a bad sentence, i resolve

[jira] [Commented] (NUTCH-2478) // is not a valid base URL

2017-12-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288289#comment-16288289 ] Markus Jelsma commented on NUTCH-2478: -- Yes, this needs a change in the pa

<    1   2   3   4   5   6   7   8   9   10   >