Re: mapreduce on two tables

Jean-Daniel Cryans Mon, 07 Nov 2011 06:59:57 -0800

You don't really need to store that into another HBase table, just
dump it into HDFS (unless you want to do random access on that second
table, which acts as a secondary index for documents by authors).


It's a workable solution, it's just brute force.

J-D

On Mon, Nov 7, 2011 at 11:02 AM, Rohit Kelkar <[email protected]> wrote:
> I needed some feedback about best way of implementing the following -
> In my document table I have documentid as row-id and content:author,
> content:text stored in each row. I want to process all documents
> pertaining to each author in a map reduce job. ie. my map will take
> key=author and values="all documentids sent by that sender". But for
> this first I would have to find all distinct authors and store them in
> another table. Then run map-reduce job on the second table. Am I
> thinking in the right direction or is there a better way to achieve
> this?
> - Rohit Kelkar
>

Re: mapreduce on two tables

Reply via email to