You don't really need to store that into another HBase table, just
dump it into HDFS (unless you want to do random access on that second
table, which acts as a secondary index for documents by authors).

It's a workable solution, it's just brute force.

J-D

On Mon, Nov 7, 2011 at 11:02 AM, Rohit Kelkar <[email protected]> wrote:
> I needed some feedback about best way of implementing the following -
> In my document table I have documentid as row-id and content:author,
> content:text stored in each row. I want to process all documents
> pertaining to each author in a map reduce job. ie. my map will take
> key=author and values="all documentids sent by that sender". But for
> this first I would have to find all distinct authors and store them in
> another table. Then run map-reduce job on the second table. Am I
> thinking in the right direction or is there a better way to achieve
> this?
> - Rohit Kelkar
>

Reply via email to