In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.
----- Original Message ----- | From: "Ted Dunning" <[email protected]> | To: [email protected] | Cc: "Hadoop User" <[email protected]> | Sent: Wednesday, October 10, 2012 7:58:57 AM | Subject: Re: Hadoop/Lucene + Solr architecture suggestions? | | I prefer to create indexes in the reducer personally. | | Also you can avoid the copies if you use an advanced hadoop-derived | distro. Email me off list for details. | | Sent from my iPhone | | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <[email protected]> | wrote: | | > Hi, | > | > if I create a Lucene index in each mapper, locally, then copy them | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers | > copy them to some Solr machine (perhaps even merging), does such | > architecture makes sense, to create a searchable index with | > Hadoop? | > | > Are there links for similar architectures and questions? | > | > Thank you. Sincerely, | > Mark |
