> stoopid question, but I can't find any info on it... can we now parse Open > Graph metatags?
parse-tika extracts og:* metatags % bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-tika' http://ogp.me/ ... Parse Metadata: og:image=http://ogp.me/logo.png og:type=website og:image:width=300 og:image:alt=The Open Graph logo og:title=Open Graph protocol ... % bin/nutch indexchecker -Dindex.parse.md=og:image,og:title,og:description \ -Dplugin.includes='protocol-http|parse-tika|index-metadata' http://ogp.me/ ... og:image : http://ogp.me/logo.png og:title : Open Graph protocol digest : f98d6d5e5894ef83561630ebef3bf060 id : http://ogp.me/ og:description : The Open Graph protocol enables any web page to become a rich object in a social graph. On 06/11/2018 11:44 PM, BlackIce wrote: > +1 > > stoopid question, but I can't find any info on it... can we now parse Open > Graph metatags? > > Greetz > > On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández <roan...@uci.cu> > wrote: > >> +1 >> >> Regards >> >> ----- Chris Mattmann <mattm...@apache.org> escribió: >>> ++1! >>> >>> >>> >>> Sounds great. >>> >>> >>> >>> Cheers, >>> >>> Chris >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From: Sebastian Nagel <wastl.na...@googlemail.com> >>> Reply-To: "d...@nutch.apache.org" <d...@nutch.apache.org> >>> Date: Monday, June 11, 2018 at 7:35 AM >>> To: "user@nutch.apache.org" <user@nutch.apache.org> >>> Cc: "d...@nutch.apache.org" <d...@nutch.apache.org> >>> Subject: Preparing to release Nutch 1.15 ? >>> >>> >>> >>> Hi all, >>> >>> >>> >>> almost 80 fixes and improvements are done now and include: >>> >>> >>> >>> NUTCH-2375 upgrade to new mapreduce API >>> >>> It was a huge change affecting more than 10,000 lines of code. Thanks, >> Omkar! >>> >>> Well, there have been some regressions but those are resolved now. >> Tests in >>> >>> pseudo-distributed mode [1] succeeded and also a mid-size test crawl >> (180 >>> >>> million pages) on a Hadoop cluster. >>> >>> Would be great if anybody is able to test the Nutch master in >> combination with >>> >>> a non-HDFS file system (e.g. s3://)! Please let us know whether this >> works. Thanks! >>> >>> >>> >>> NUTCH-1480: Multiple index writer instances with different configurations >>> >>> Thanks to Roannel it's now possible to index into multiple Solr or >> Elasticsearch >>> >>> instances. With NUTCH- (needs to be reviewed) also the routing to of >> documents >>> >>> to the index will be configurable. >>> >>> >>> >>> NUTCH-2583: Ralf contributed a huge upgrade of dependencies. >>> >>> Nutch now runs and compiles on Java 9 + 10. Only errors in unit tests >> need >>> >>> to be addressed in NUTCH-2596. >>> >>> >>> >>> And two important issues are almost ready to be committed soon: >>> >>> >>> >>> NUTCH-2549: a long list of fixes and improvements to protocol-http. >> Thanks to >>> >>> Gerard Bouchard! >>> >>> >>> >>> NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation >> based >>> >>> on the okhttp library. Supports HTTP/2. >>> >>> >>> >>> >>> >>> The full list of fixes and improvements is available at [2]. >>> >>> >>> >>> I'll plan to work through the remaining 70 open issues during the next >>> >>> days and hope to commit/resolve 15-25 of them and move the remaining >>> >>> ones to Nutch 1.16. >>> >>> >>> >>> Please vote for issues you want to get included. If there are open >>> >>> pull requests, it will help if these can be merged, the unit tests >>> >>> pass, and any review comments are addressed. Thanks! >>> >>> >>> >>> If there are any objections or blockers, please also let us know! >>> >>> >>> >>> I'll also plan to run a test crawl on Hadoop mid of this week. >>> >>> But any help in testing is welcome. >>> >>> >>> >>> Note that the tutorial needs to be updated (will be done after 1.15 >>> >>> is finally released) to reflect the changes related to NUTCH-1480. >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Sebastian >>> >>> >>> >>> >>> >>> [1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster >>> >>> [2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302 >>> >>> >>> >>> >>> >> >> UCIENCIA 2018: III Conferencia Científica Internacional de la Universidad >> de las Ciencias Informáticas. >> Del 24-26 de septiembre, 2018 http://uciencia.uci.cu http://eventos.uci.cu >> >