Hi Robert,
unfortunately, I'm not able to reproduce the problem.
Fetching works with the recent 1.x and Java 8, I've tried both:
bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-html'
https://potomac.edu/
bin/nutch parsechecker
Hi,
> more control over what is being indexed?
It's possible to enable URL filters for the indexer:
bin/nutch index ... -filter
With little extra effort you can use different URL filter rules
during the index step, e.g. in local mode by pointing NUTCH_CONF_DIR
to a different folder.
>> I
I think you will find that you need different rules for each website and that
some amount of maintenance will be needed as the websites change their
practices.
I found out that there is no direct way to do it, the problem was solved
through calling of the regex transformation one more time in IndexerMapReduce,
before the Indexer gets the Doc for writting.
Something like(IndexerMapReduce.java:line 369),
doc.add("modifiedId",
Hello Shiva,
Yes, that is possible, but it (ours) is not a fool proof solution.
We got our first hub classifier years ago in the form of a simple ParseFilter
backed by an SVM. The model was built solely on the HTML of positive and
negative examples, with very few features, so it was extremely
Again I thank you Sebastian! I was able to resolve the issue by updating
the HTTPClient library. I also updated from Nutch 1.11 to 1.14 and had no
issue with the SSL.
Best,
...bob
On Tue, Mar 20, 2018 at 5:03 PM, Sebastian Nagel wrote:
> Hi Robert,
>
> although
Hi Robert,
although the error message differs, somewhat resembles
https://issues.apache.org/jira/browse/NUTCH-2447
I've tried to reproduce it using Nutch 1.11, but it works
with Java 8 on Ubuntu 16.04. Sorry, I have no glue where even
to start searching for the reason.
Best,
Sebastian
On
Thank you Sebastian! I am still working on the issue. I tested the cert
using openssl and also got the same handshake failure. After further
checking I found that the openssl command works when I add the -servername
option. So apparently, my nutch server (Fedora 27) requires SNI. I added
8 matches
Mail list logo