Hi,
I am currently using nutch to crawl some job pages from job boards. They are
in my solr index now. I want to do faceted search with the job titles. How?
The job titles can be in any locations of the page, e.g. title, header,
content... If I use indexfilter in Nutch to search the
Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com]
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title
Hi,
I am currently using nutch to crawl some job pages from job boards. They are
in my solr index now. I want to do faceted
Sent: Wed, July 21, 2010 8:42:55 AM
Subject: RE: faceted search with job title
You'd probably need to do some post processing on the pages and set up rules
for
each website to grab that specific bit of data. You could load the html into an
xml parser, then use xpath to grab content from
-Kallin Nagelberg
-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com]
Sent: Wednesday, July 21, 2010 12:20 PM
To: solr-user@lucene.apache.org
Cc: dave.sea...@magicalia.com
Subject: Re: faceted search with job title
mmm...there must be better way...each job
:P
-Kallin Nagelberg
-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com]
Sent: Wednesday, July 21, 2010 12:20 PM
To: solr-user@lucene.apache.org
Cc: dave.sea...@magicalia.com
Subject: Re: faceted search with job title
mmm...there must be better way...each job
21, 2010 10:39:32 AM
Subject: RE: faceted search with job title
Yeah you should definitely just setup a custom parser for each site.. should
be
easy to extract title using groovy's xml parsing along with tagsoup for
sloppy
html. If you can't find the pattern for each site leading