sweet thnx! On Tue, Jun 12, 2018 at 1:29 PM Sebastian Nagel <[email protected]> wrote:
> > stoopid question, but I can't find any info on it... can we now parse > Open > > Graph metatags? > > parse-tika extracts og:* metatags > > % bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-tika' > http://ogp.me/ > ... > Parse Metadata: og:image=http://ogp.me/logo.png og:type=website > og:image:width=300 > og:image:alt=The Open Graph logo og:title=Open Graph protocol ... > > % bin/nutch indexchecker -Dindex.parse.md=og:image,og:title,og:description > \ > -Dplugin.includes='protocol-http|parse-tika|index-metadata' > http://ogp.me/ > ... > og:image : http://ogp.me/logo.png > og:title : Open Graph protocol > digest : f98d6d5e5894ef83561630ebef3bf060 > id : http://ogp.me/ > og:description : The Open Graph protocol enables any web page to > become a rich object in a > social graph. > > > On 06/11/2018 11:44 PM, BlackIce wrote: > > +1 > > > > stoopid question, but I can't find any info on it... can we now parse > Open > > Graph metatags? > > > > Greetz > > > > On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández < > [email protected]> > > wrote: > > > >> +1 > >> > >> Regards > >> > >> ----- Chris Mattmann <[email protected]> escribió: > >>> ++1! > >>> > >>> > >>> > >>> Sounds great. > >>> > >>> > >>> > >>> Cheers, > >>> > >>> Chris > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> From: Sebastian Nagel <[email protected]> > >>> Reply-To: "[email protected]" <[email protected]> > >>> Date: Monday, June 11, 2018 at 7:35 AM > >>> To: "[email protected]" <[email protected]> > >>> Cc: "[email protected]" <[email protected]> > >>> Subject: Preparing to release Nutch 1.15 ? > >>> > >>> > >>> > >>> Hi all, > >>> > >>> > >>> > >>> almost 80 fixes and improvements are done now and include: > >>> > >>> > >>> > >>> NUTCH-2375 upgrade to new mapreduce API > >>> > >>> It was a huge change affecting more than 10,000 lines of code. > Thanks, > >> Omkar! > >>> > >>> Well, there have been some regressions but those are resolved now. > >> Tests in > >>> > >>> pseudo-distributed mode [1] succeeded and also a mid-size test crawl > >> (180 > >>> > >>> million pages) on a Hadoop cluster. > >>> > >>> Would be great if anybody is able to test the Nutch master in > >> combination with > >>> > >>> a non-HDFS file system (e.g. s3://)! Please let us know whether this > >> works. Thanks! > >>> > >>> > >>> > >>> NUTCH-1480: Multiple index writer instances with different > configurations > >>> > >>> Thanks to Roannel it's now possible to index into multiple Solr or > >> Elasticsearch > >>> > >>> instances. With NUTCH- (needs to be reviewed) also the routing to of > >> documents > >>> > >>> to the index will be configurable. > >>> > >>> > >>> > >>> NUTCH-2583: Ralf contributed a huge upgrade of dependencies. > >>> > >>> Nutch now runs and compiles on Java 9 + 10. Only errors in unit > tests > >> need > >>> > >>> to be addressed in NUTCH-2596. > >>> > >>> > >>> > >>> And two important issues are almost ready to be committed soon: > >>> > >>> > >>> > >>> NUTCH-2549: a long list of fixes and improvements to protocol-http. > >> Thanks to > >>> > >>> Gerard Bouchard! > >>> > >>> > >>> > >>> NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation > >> based > >>> > >>> on the okhttp library. Supports HTTP/2. > >>> > >>> > >>> > >>> > >>> > >>> The full list of fixes and improvements is available at [2]. > >>> > >>> > >>> > >>> I'll plan to work through the remaining 70 open issues during the next > >>> > >>> days and hope to commit/resolve 15-25 of them and move the remaining > >>> > >>> ones to Nutch 1.16. > >>> > >>> > >>> > >>> Please vote for issues you want to get included. If there are open > >>> > >>> pull requests, it will help if these can be merged, the unit tests > >>> > >>> pass, and any review comments are addressed. Thanks! > >>> > >>> > >>> > >>> If there are any objections or blockers, please also let us know! > >>> > >>> > >>> > >>> I'll also plan to run a test crawl on Hadoop mid of this week. > >>> > >>> But any help in testing is welcome. > >>> > >>> > >>> > >>> Note that the tutorial needs to be updated (will be done after 1.15 > >>> > >>> is finally released) to reflect the changes related to NUTCH-1480. > >>> > >>> > >>> > >>> > >>> > >>> Thanks, > >>> > >>> Sebastian > >>> > >>> > >>> > >>> > >>> > >>> [1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster > >>> > >>> [2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302 > >>> > >>> > >>> > >>> > >>> > >> > >> UCIENCIA 2018: III Conferencia Científica Internacional de la > Universidad > >> de las Ciencias Informáticas. > >> Del 24-26 de septiembre, 2018 http://uciencia.uci.cu > http://eventos.uci.cu > >> > > > >

