sweet thnx!

On Tue, Jun 12, 2018 at 1:29 PM Sebastian Nagel <[email protected]>
wrote:

> > stoopid question, but I can't find any info on it... can we now parse
> Open
> > Graph metatags?
>
> parse-tika extracts og:* metatags
>
> % bin/nutch parsechecker -Dplugin.includes='protocol-http|parse-tika'
> http://ogp.me/
> ...
> Parse Metadata: og:image=http://ogp.me/logo.png og:type=website
> og:image:width=300
>   og:image:alt=The Open Graph logo og:title=Open Graph protocol ...
>
> % bin/nutch indexchecker -Dindex.parse.md=og:image,og:title,og:description
> \
>     -Dplugin.includes='protocol-http|parse-tika|index-metadata'
> http://ogp.me/
> ...
> og:image :      http://ogp.me/logo.png
> og:title :      Open Graph protocol
> digest :        f98d6d5e5894ef83561630ebef3bf060
> id :    http://ogp.me/
> og:description :        The Open Graph protocol enables any web page to
> become a rich object in a
> social graph.
>
>
> On 06/11/2018 11:44 PM, BlackIce wrote:
> > +1
> >
> > stoopid question, but I can't find any info on it... can we now parse
> Open
> > Graph metatags?
> >
> > Greetz
> >
> > On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández <
> [email protected]>
> > wrote:
> >
> >> +1
> >>
> >> Regards
> >>
> >> ----- Chris Mattmann <[email protected]> escribió:
> >>> ++1!
> >>>
> >>>
> >>>
> >>> Sounds great.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Sebastian Nagel <[email protected]>
> >>> Reply-To: "[email protected]" <[email protected]>
> >>> Date: Monday, June 11, 2018 at 7:35 AM
> >>> To: "[email protected]" <[email protected]>
> >>> Cc: "[email protected]" <[email protected]>
> >>> Subject: Preparing to release Nutch 1.15 ?
> >>>
> >>>
> >>>
> >>> Hi all,
> >>>
> >>>
> >>>
> >>> almost 80 fixes and improvements are done now and include:
> >>>
> >>>
> >>>
> >>> NUTCH-2375 upgrade to new mapreduce API
> >>>
> >>>   It was a huge change affecting more than 10,000 lines of code.
> Thanks,
> >> Omkar!
> >>>
> >>>   Well, there have been some regressions but those are resolved now.
> >> Tests in
> >>>
> >>>   pseudo-distributed mode [1] succeeded and also a mid-size test crawl
> >> (180
> >>>
> >>>   million pages) on a Hadoop cluster.
> >>>
> >>>   Would be great if anybody is able to test the Nutch master in
> >> combination with
> >>>
> >>>   a non-HDFS file system (e.g. s3://)! Please let us know whether this
> >> works. Thanks!
> >>>
> >>>
> >>>
> >>> NUTCH-1480: Multiple index writer instances with different
> configurations
> >>>
> >>>   Thanks to Roannel it's now possible to index into multiple Solr or
> >> Elasticsearch
> >>>
> >>>   instances. With NUTCH- (needs to be reviewed) also the routing to of
> >> documents
> >>>
> >>>   to the index will be configurable.
> >>>
> >>>
> >>>
> >>> NUTCH-2583: Ralf contributed a huge upgrade of dependencies.
> >>>
> >>>    Nutch now runs and compiles on Java 9 + 10. Only errors in unit
> tests
> >> need
> >>>
> >>>    to be addressed in NUTCH-2596.
> >>>
> >>>
> >>>
> >>> And two important issues are almost ready to be committed soon:
> >>>
> >>>
> >>>
> >>> NUTCH-2549: a long list of fixes and improvements to protocol-http.
> >> Thanks to
> >>>
> >>>    Gerard Bouchard!
> >>>
> >>>
> >>>
> >>> NUTCH-2576: plugin protocol-okhttp, a new HTTP protocol implementation
> >> based
> >>>
> >>>    on the okhttp library. Supports HTTP/2.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> The full list of fixes and improvements is available at [2].
> >>>
> >>>
> >>>
> >>> I'll plan to work through the remaining 70 open issues during the next
> >>>
> >>> days and hope to commit/resolve 15-25 of them and move the remaining
> >>>
> >>> ones to Nutch 1.16.
> >>>
> >>>
> >>>
> >>> Please vote for issues you want to get included. If there are open
> >>>
> >>> pull requests, it will help if these can be merged, the unit tests
> >>>
> >>> pass, and any review comments are addressed. Thanks!
> >>>
> >>>
> >>>
> >>> If there are any objections or blockers, please also let us know!
> >>>
> >>>
> >>>
> >>> I'll also plan to run a test crawl on Hadoop mid of this week.
> >>>
> >>> But any help in testing is welcome.
> >>>
> >>>
> >>>
> >>> Note that the tutorial needs to be updated (will be done after 1.15
> >>>
> >>> is finally released) to reflect the changes related to NUTCH-1480.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Sebastian
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> [1] https://github.com/sebastian-nagel/nutch-test-single-node-cluster
> >>>
> >>> [2] https://issues.apache.org/jira/projects/NUTCH/versions/12342302
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> UCIENCIA 2018: III Conferencia Científica Internacional de la
> Universidad
> >> de las Ciencias Informáticas.
> >> Del 24-26 de septiembre, 2018 http://uciencia.uci.cu
> http://eventos.uci.cu
> >>
> >
>
>

Reply via email to