Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-12 Thread lewis john mcgibbney
org> wrote: > From: David Ferrero <david.ferr...@zion.com> > To: user@nutch.apache.org > Cc: > Bcc: > Date: Sat, 10 Feb 2018 12:41:57 -0700 > Subject: Re: NUTCH-1129, Any23, microdata parsing, indexing, and > extraction? > Awesome on Any23 2.2 forthcoming release. I

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-10 Thread David Ferrero
nges to Any23-Triples >> microdata parsed. >> >> What might I be doing wrong? >> >>> On Feb 8, 2018, at 11:17 AM, lewis john mcgibbney <lewi...@apache.org> >>> wrote: >>> >>> Hi David, >>> Answers inline >>>

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-09 Thread Lewis John McGibbney
lt;user-digest-h...@nutch.apache.org> wrote: > > > >> > >> From: David Ferrero <david.ferr...@zion.com> > >> To: user@nutch.apache.org > >> Cc: > >> Bcc: > >> Date: Thu, 8 Feb 2018 10:19:52 -0700 > >> Subject: NUTCH-1129, Any23

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
he > supported extractors, I see Any23 mentions it supports JSON+LD input, so I > added this to nutch-site.xml to override the same property in > nutch-default.xml: > > > any23.extractors > html-microdata,html-embedded-jsonld,rdf-jsonld > Comma-separated list

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
Thank you for this information. Since this is very much related to Any23 and microdata parsing, I’m going to ask what I believe is a related question but keep this same thread so it will be organized in one place: I noticed a lot of job boards such as dice.com , monster.com

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread lewis john mcgibbney
Hi David, Answers inline On Thu, Feb 8, 2018 at 9:19 AM, <user-digest-h...@nutch.apache.org> wrote: > > From: David Ferrero <david.ferr...@zion.com> > To: user@nutch.apache.org > Cc: > Bcc: > Date: Thu, 8 Feb 2018 10:19:52 -0700 > Subject: NUTCH-1129, A

NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
Pull request #205 was recently merged into master branch for Nutch 1.x in fulfillment of NUTCH-1129 "microdata for Nutch 1.x" I am new to nutch and solr and have just started crawling and indexing a few select websites. Using the built in html parsing/indexing, I am getting searchable fields