+1 Nice work all!

On 11-06-18 23:44, BlackIce wrote:
+1

stoopid question, but I can't find any info on it... can we now parse Open
Graph metatags?

Greetz

On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández <roan...@uci.cu>
wrote:

+1

Regards

----- Chris Mattmann <mattm...@apache.org> escribió:
++1!



Sounds great.



Cheers,

Chris









From: Sebastian Nagel <wastl.na...@googlemail.com>
Reply-To: "d...@nutch.apache.org" <d...@nutch.apache.org>
Date: Monday, June 11, 2018 at 7:35 AM
To: "user@nutch.apache.org" <user@nutch.apache.org>
Cc: "d...@nutch.apache.org" <d...@nutch.apache.org>
Subject: Preparing to release Nutch 1.15 ?



Hi all,



almost 80 fixes and improvements are done now and include:



NUTCH-2375 upgrade to new mapreduce API

   It was a huge change affecting more than 10,000 lines of code. Thanks,
Omkar!
   Well, there have been some regressions but those are resolved now.
Tests in
   pseudo-distributed mode [1] succeeded and also a mid-size test crawl
(180
   million pages) on a Hadoop cluster.

   Would be great if anybody is able to test the Nutch master in
combination with
   a non-HDFS file system (e.g. s3://)! Please let us know whether this
works. Thanks!


NUTCH-1480: Multiple index writer instances with different configurations

   Thanks to Roannel it's now possible to index into multiple Solr or
Elasticsearch
   instances. With NUTCH- (needs to be reviewed) also the routing to of
documents
   to the index will be configurable.



NUTCH-2583: Ralf contributed a huge upgrade of dependencies.

    Nutch now runs and compiles on Java 9 + 10. Only errors in unit tests
need
    to be addressed in NUTCH-2596.



And two important issues are almost ready to be committed soon:



NUTCH-2549: a long list of fixes and improvements to protocol-http.
Thanks to
    Gerard Bouchard!



NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation
based
    on the okhttp library. Supports HTTP/2.





The full list of fixes and improvements is available at [2].



I'll plan to work through the remaining 70 open issues during the next

days and hope to commit/resolve 15-25 of them and move the remaining

ones to Nutch 1.16.



Please vote for issues you want to get included. If there are open

pull requests, it will help if these can be merged, the unit tests

pass, and any review comments are addressed. Thanks!



If there are any objections or blockers, please also let us know!



I'll also plan to run a test crawl on Hadoop mid of this week.

But any help in testing is welcome.



Note that the tutorial needs to be updated (will be done after 1.15

is finally released) to reflect the changes related to NUTCH-1480.





Thanks,

Sebastian





[1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster

[2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302





UCIENCIA 2018: III Conferencia Científica Internacional de la Universidad
de las Ciencias Informáticas.
Del 24-26 de septiembre, 2018 http://uciencia.uci.cu http://eventos.uci.cu


Reply via email to