On Wed, Jun 4, 2008 at 1:35 PM, <[EMAIL PROTECTED]> wrote: > I think you might be doing a bit of extra work there. There is no need to > create XML files for Solr. When you read fetched/parsed data, use something > like solrj to post to Solr without creating intermediary XML files on disk.
I might be misunderstanding you, but it seems like it's better for to deal with the xml files rather than something like ruby-solr or solrj. I don't want any of the hadoop jobs to have solr dependencies - they just write to text xml files in the normal hadoop way, and someone else is responsible for getting the results into solr. In this case, it's some fairly trivial shell scripts that run on each solr machine and do a dfs cat /whatever.xml | post_to_a_solr_instance at the end of the run. (Using solr clustering here, so each machine is responsible for loading only its own xml files) But I'd be happy to skip a step - am I just missing something obvious? -- James Moore | [EMAIL PROTECTED] Ruby and Ruby on Rails consulting blog.restphone.com
