Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
Just to be clear I'm using any23 in the plugin.includes, I am getting Any23-Triples metadata. However I am hoping to see more Any23-Triples when I added json+ld extractors to any23.extractors... ./bin/nutch parsechecker

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
Thank you for this information. Since this is very much related to Any23 and microdata parsing, I’m going to ask what I believe is a related question but keep this same thread so it will be organized in one place: I noticed a lot of job boards such as dice.com , monster.com

Re: NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread lewis john mcgibbney
Hi David, Answers inline On Thu, Feb 8, 2018 at 9:19 AM, wrote: > > From: David Ferrero > To: user@nutch.apache.org > Cc: > Bcc: > Date: Thu, 8 Feb 2018 10:19:52 -0700 > Subject: NUTCH-1129, Any23, microdata parsing, indexing, and

NUTCH-1129, Any23, microdata parsing, indexing, and extraction?

2018-02-08 Thread David Ferrero
Pull request #205 was recently merged into master branch for Nutch 1.x in fulfillment of NUTCH-1129 "microdata for Nutch 1.x" I am new to nutch and solr and have just started crawling and indexing a few select websites. Using the built in html parsing/indexing, I am getting searchable fields