faceted search with job title

2010-07-21 Thread Savannah Beckett
Hi,   I am currently using nutch to crawl some job pages from job boards.  They are in my solr index now.  I want to do faceted search with the job titles.  How?  The job titles can be in any locations of the page, e.g. title, header, content...   If I use indexfilter in Nutch to search the

RE: faceted search with job title

2010-07-21 Thread Dave Searle
Message- From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] Sent: 21 July 2010 16:38 To: solr-user@lucene.apache.org Subject: faceted search with job title Hi,   I am currently using nutch to crawl some job pages from job boards.  They are in my solr index now.  I want to do faceted

Re: faceted search with job title

2010-07-21 Thread Savannah Beckett
Sent: Wed, July 21, 2010 8:42:55 AM Subject: RE: faceted search with job title You'd probably need to do some post processing on the pages and set up rules for each website to grab that specific bit of data. You could load the html into an xml parser, then use xpath to grab content from

RE: faceted search with job title

2010-07-21 Thread Nagelberg, Kallin
-Kallin Nagelberg -Original Message- From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] Sent: Wednesday, July 21, 2010 12:20 PM To: solr-user@lucene.apache.org Cc: dave.sea...@magicalia.com Subject: Re: faceted search with job title mmm...there must be better way...each job

Re: faceted search with job title

2010-07-21 Thread Savannah Beckett
:P -Kallin Nagelberg -Original Message- From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] Sent: Wednesday, July 21, 2010 12:20 PM To: solr-user@lucene.apache.org Cc: dave.sea...@magicalia.com Subject: Re: faceted search with job title mmm...there must be better way...each job

Re: faceted search with job title

2010-07-21 Thread Dave Searle
21, 2010 10:39:32 AM Subject: RE: faceted search with job title Yeah you should definitely just setup a custom parser for each site.. should be easy to extract title using groovy's xml parsing along with tagsoup for sloppy html. If you can't find the pattern for each site leading