Re: Ideas for solutions to Crawling and Solr

James Moore Wed, 04 Jun 2008 16:35:24 -0700

On Wed, Jun 4, 2008 at 1:35 PM,  <[EMAIL PROTECTED]> wrote:
> I think you might be doing a bit of extra work there.  There is no need to 
> create XML files for Solr.  When you read fetched/parsed data, use something 
> like solrj to post to Solr without creating intermediary XML files on disk.


I might be misunderstanding you, but it seems like it's better for to
deal with the xml files rather than something like ruby-solr or solrj.
I don't want any of the hadoop jobs to have solr dependencies - they
just write to text xml files in the normal hadoop way, and someone
else is responsible for getting the results into solr.  In this case,
it's some fairly trivial shell scripts that run on each solr machine
and do a dfs cat /whatever.xml | post_to_a_solr_instance at the end of
the run.  (Using solr clustering here, so each machine is responsible
for loading only its own xml files)

But I'd be happy to skip a step - am I just missing something obvious?

-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com

Re: Ideas for solutions to Crawling and Solr

Reply via email to