PS: Does this work when configured in site.xml like regular metatdata?

On Tue, Jun 12, 2018 at 1:31 PM BlackIce <blackice...@gmail.com> wrote:

> sweet thnx!
>
> On Tue, Jun 12, 2018 at 1:29 PM Sebastian Nagel <
> wastl.na...@googlemail.com> wrote:
>
>> > stoopid question, but I can't find any info on it... can we now parse
>> Open
>> > Graph metatags?
>>
>> parse-tika extracts og:* metatags
>>
>> % bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-tika'
>> http://ogp.me/
>> ...
>> Parse Metadata: og:image=http://ogp.me/logo.png og:type=website
>> og:image:width=300
>>   og:image:alt=The Open Graph logo og:title=Open Graph protocol ...
>>
>> % bin/nutch indexchecker -Dindex.parse.md=og:image,og:title,og:description
>> \
>>     -Dplugin.includes='protocol-http|parse-tika|index-metadata'
>> http://ogp.me/
>> ...
>> og:image :      http://ogp.me/logo.png
>> og:title :      Open Graph protocol
>> digest :        f98d6d5e5894ef83561630ebef3bf060
>> id :    http://ogp.me/
>> og:description :        The Open Graph protocol enables any web page to
>> become a rich object in a
>> social graph.
>>
>>
>> On 06/11/2018 11:44 PM, BlackIce wrote:
>> > +1
>> >
>> > stoopid question, but I can't find any info on it... can we now parse
>> Open
>> > Graph metatags?
>> >
>> > Greetz
>> >
>> > On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández <
>> roan...@uci.cu>
>> > wrote:
>> >
>> >> +1
>> >>
>> >> Regards
>> >>
>> >> ----- Chris Mattmann <mattm...@apache.org> escribió:
>> >>> ++1!
>> >>>
>> >>>
>> >>>
>> >>> Sounds great.
>> >>>
>> >>>
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Chris
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> From: Sebastian Nagel <wastl.na...@googlemail.com>
>> >>> Reply-To: "d...@nutch.apache.org" <d...@nutch.apache.org>
>> >>> Date: Monday, June 11, 2018 at 7:35 AM
>> >>> To: "user@nutch.apache.org" <user@nutch.apache.org>
>> >>> Cc: "d...@nutch.apache.org" <d...@nutch.apache.org>
>> >>> Subject: Preparing to release Nutch 1.15 ?
>> >>>
>> >>>
>> >>>
>> >>> Hi all,
>> >>>
>> >>>
>> >>>
>> >>> almost 80 fixes and improvements are done now and include:
>> >>>
>> >>>
>> >>>
>> >>> NUTCH-2375 upgrade to new mapreduce API
>> >>>
>> >>>   It was a huge change affecting more than 10,000 lines of code.
>> Thanks,
>> >> Omkar!
>> >>>
>> >>>   Well, there have been some regressions but those are resolved now.
>> >> Tests in
>> >>>
>> >>>   pseudo-distributed mode [1] succeeded and also a mid-size test crawl
>> >> (180
>> >>>
>> >>>   million pages) on a Hadoop cluster.
>> >>>
>> >>>   Would be great if anybody is able to test the Nutch master in
>> >> combination with
>> >>>
>> >>>   a non-HDFS file system (e.g. s3://)! Please let us know whether this
>> >> works. Thanks!
>> >>>
>> >>>
>> >>>
>> >>> NUTCH-1480: Multiple index writer instances with different
>> configurations
>> >>>
>> >>>   Thanks to Roannel it's now possible to index into multiple Solr or
>> >> Elasticsearch
>> >>>
>> >>>   instances. With NUTCH- (needs to be reviewed) also the routing to of
>> >> documents
>> >>>
>> >>>   to the index will be configurable.
>> >>>
>> >>>
>> >>>
>> >>> NUTCH-2583: Ralf contributed a huge upgrade of dependencies.
>> >>>
>> >>>    Nutch now runs and compiles on Java 9 + 10. Only errors in unit
>> tests
>> >> need
>> >>>
>> >>>    to be addressed in NUTCH-2596.
>> >>>
>> >>>
>> >>>
>> >>> And two important issues are almost ready to be committed soon:
>> >>>
>> >>>
>> >>>
>> >>> NUTCH-2549: a long list of fixes and improvements to protocol-http.
>> >> Thanks to
>> >>>
>> >>>    Gerard Bouchard!
>> >>>
>> >>>
>> >>>
>> >>> NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation
>> >> based
>> >>>
>> >>>    on the okhttp library. Supports HTTP/2.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> The full list of fixes and improvements is available at [2].
>> >>>
>> >>>
>> >>>
>> >>> I'll plan to work through the remaining 70 open issues during the next
>> >>>
>> >>> days and hope to commit/resolve 15-25 of them and move the remaining
>> >>>
>> >>> ones to Nutch 1.16.
>> >>>
>> >>>
>> >>>
>> >>> Please vote for issues you want to get included. If there are open
>> >>>
>> >>> pull requests, it will help if these can be merged, the unit tests
>> >>>
>> >>> pass, and any review comments are addressed. Thanks!
>> >>>
>> >>>
>> >>>
>> >>> If there are any objections or blockers, please also let us know!
>> >>>
>> >>>
>> >>>
>> >>> I'll also plan to run a test crawl on Hadoop mid of this week.
>> >>>
>> >>> But any help in testing is welcome.
>> >>>
>> >>>
>> >>>
>> >>> Note that the tutorial needs to be updated (will be done after 1.15
>> >>>
>> >>> is finally released) to reflect the changes related to NUTCH-1480.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Sebastian
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> [1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster
>> >>>
>> >>> [2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >> UCIENCIA 2018: III Conferencia Científica Internacional de la
>> Universidad
>> >> de las Ciencias Informáticas.
>> >> Del 24-26 de septiembre, 2018 http://uciencia.uci.cu
>> http://eventos.uci.cu
>> >>
>> >
>>
>>

Reply via email to