You don't really need to store that into another HBase table, just dump it into HDFS (unless you want to do random access on that second table, which acts as a secondary index for documents by authors).
It's a workable solution, it's just brute force. J-D On Mon, Nov 7, 2011 at 11:02 AM, Rohit Kelkar <[email protected]> wrote: > I needed some feedback about best way of implementing the following - > In my document table I have documentid as row-id and content:author, > content:text stored in each row. I want to process all documents > pertaining to each author in a map reduce job. ie. my map will take > key=author and values="all documentids sent by that sender". But for > this first I would have to find all distinct authors and store them in > another table. Then run map-reduce job on the second table. Am I > thinking in the right direction or is there a better way to achieve > this? > - Rohit Kelkar >
