PS: Does this work when configured in site.xml like regular metatdata? On Tue, Jun 12, 2018 at 1:31 PM BlackIce <blackice...@gmail.com> wrote:
> sweet thnx! > > On Tue, Jun 12, 2018 at 1:29 PM Sebastian Nagel < > wastl.na...@googlemail.com> wrote: > >> > stoopid question, but I can't find any info on it... can we now parse >> Open >> > Graph metatags? >> >> parse-tika extracts og:* metatags >> >> % bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-tika' >> http://ogp.me/ >> ... >> Parse Metadata: og:image=http://ogp.me/logo.png og:type=website >> og:image:width=300 >> og:image:alt=The Open Graph logo og:title=Open Graph protocol ... >> >> % bin/nutch indexchecker -Dindex.parse.md=og:image,og:title,og:description >> \ >> -Dplugin.includes='protocol-http|parse-tika|index-metadata' >> http://ogp.me/ >> ... >> og:image : http://ogp.me/logo.png >> og:title : Open Graph protocol >> digest : f98d6d5e5894ef83561630ebef3bf060 >> id : http://ogp.me/ >> og:description : The Open Graph protocol enables any web page to >> become a rich object in a >> social graph. >> >> >> On 06/11/2018 11:44 PM, BlackIce wrote: >> > +1 >> > >> > stoopid question, but I can't find any info on it... can we now parse >> Open >> > Graph metatags? >> > >> > Greetz >> > >> > On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández < >> roan...@uci.cu> >> > wrote: >> > >> >> +1 >> >> >> >> Regards >> >> >> >> ----- Chris Mattmann <mattm...@apache.org> escribió: >> >>> ++1! >> >>> >> >>> >> >>> >> >>> Sounds great. >> >>> >> >>> >> >>> >> >>> Cheers, >> >>> >> >>> Chris >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> From: Sebastian Nagel <wastl.na...@googlemail.com> >> >>> Reply-To: "d...@nutch.apache.org" <d...@nutch.apache.org> >> >>> Date: Monday, June 11, 2018 at 7:35 AM >> >>> To: "user@nutch.apache.org" <user@nutch.apache.org> >> >>> Cc: "d...@nutch.apache.org" <d...@nutch.apache.org> >> >>> Subject: Preparing to release Nutch 1.15 ? >> >>> >> >>> >> >>> >> >>> Hi all, >> >>> >> >>> >> >>> >> >>> almost 80 fixes and improvements are done now and include: >> >>> >> >>> >> >>> >> >>> NUTCH-2375 upgrade to new mapreduce API >> >>> >> >>> It was a huge change affecting more than 10,000 lines of code. >> Thanks, >> >> Omkar! >> >>> >> >>> Well, there have been some regressions but those are resolved now. >> >> Tests in >> >>> >> >>> pseudo-distributed mode [1] succeeded and also a mid-size test crawl >> >> (180 >> >>> >> >>> million pages) on a Hadoop cluster. >> >>> >> >>> Would be great if anybody is able to test the Nutch master in >> >> combination with >> >>> >> >>> a non-HDFS file system (e.g. s3://)! Please let us know whether this >> >> works. Thanks! >> >>> >> >>> >> >>> >> >>> NUTCH-1480: Multiple index writer instances with different >> configurations >> >>> >> >>> Thanks to Roannel it's now possible to index into multiple Solr or >> >> Elasticsearch >> >>> >> >>> instances. With NUTCH- (needs to be reviewed) also the routing to of >> >> documents >> >>> >> >>> to the index will be configurable. >> >>> >> >>> >> >>> >> >>> NUTCH-2583: Ralf contributed a huge upgrade of dependencies. >> >>> >> >>> Nutch now runs and compiles on Java 9 + 10. Only errors in unit >> tests >> >> need >> >>> >> >>> to be addressed in NUTCH-2596. >> >>> >> >>> >> >>> >> >>> And two important issues are almost ready to be committed soon: >> >>> >> >>> >> >>> >> >>> NUTCH-2549: a long list of fixes and improvements to protocol-http. >> >> Thanks to >> >>> >> >>> Gerard Bouchard! >> >>> >> >>> >> >>> >> >>> NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation >> >> based >> >>> >> >>> on the okhttp library. Supports HTTP/2. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> The full list of fixes and improvements is available at [2]. >> >>> >> >>> >> >>> >> >>> I'll plan to work through the remaining 70 open issues during the next >> >>> >> >>> days and hope to commit/resolve 15-25 of them and move the remaining >> >>> >> >>> ones to Nutch 1.16. >> >>> >> >>> >> >>> >> >>> Please vote for issues you want to get included. If there are open >> >>> >> >>> pull requests, it will help if these can be merged, the unit tests >> >>> >> >>> pass, and any review comments are addressed. Thanks! >> >>> >> >>> >> >>> >> >>> If there are any objections or blockers, please also let us know! >> >>> >> >>> >> >>> >> >>> I'll also plan to run a test crawl on Hadoop mid of this week. >> >>> >> >>> But any help in testing is welcome. >> >>> >> >>> >> >>> >> >>> Note that the tutorial needs to be updated (will be done after 1.15 >> >>> >> >>> is finally released) to reflect the changes related to NUTCH-1480. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Thanks, >> >>> >> >>> Sebastian >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> [1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster >> >>> >> >>> [2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302 >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> >> UCIENCIA 2018: III Conferencia Científica Internacional de la >> Universidad >> >> de las Ciencias Informáticas. >> >> Del 24-26 de septiembre, 2018 http://uciencia.uci.cu >> http://eventos.uci.cu >> >> >> > >> >>