But there's a lot of processing happening with the table data before sent
over to the reducer.. Theoretically speaking, it should be possible..

Our supervisor strictly wants a mr application to do this..

Do you want to see more code? I'm just baffled as to why it's giving null
pointer when there is data clearly.

Regards,
Pavan
On Aug 19, 2013 7:41 PM, "Shahab Yunus" <[email protected]> wrote:

> I think you should not try to join the tables this way. It will be against
> the recommended design/pattern of HBase (joins in HBase alone go against
> the design) and M/R. You should first, maybe through another M/R job or PIg
> script, for example, pre-process data and massage it into a uniform or
> appropriate structure conforming to the M/R architecture (maybe convert
> them into ext files first?) Have you looked into the recommended M/R join
> strategies?
>
> Some links to start with:
>
> http://codingjunkie.net/mapreduce-reduce-joins/
> http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/
>
> http://blog.matthewrathbone.com/2013/02/09/real-world-hadoop-implementing-a-left-outer-join-in-hadoop-map-reduce.html
>
> Regards,
> Shahab
>
>
> On Mon, Aug 19, 2013 at 9:43 AM, Pavan Sudheendra <[email protected]
> >wrote:
>
> > I'm basically trying to do a join across 3 tables in the mapper.. In the
> > reducer i am doing a group by and writing the output to another table..
> >
> > Although, i agree that my code is pathetic, what i could actually do is
> > create a HTable object once and pass it as an extra argument to the map
> > function.. But, would that solve the problem?
> >
> > Roughly these are my tables and the code flows like this
> > Mapper-> Table1 -> Contentidx ->Content -> Mapper aggregates the values
> ->
> > Reducer.
> >
> >
> > Table1 -> 19 Million rows.
> > Contentidx table - 150k rows.
> > Content table - 93k rows.
> >
> > Yes, i have looked at the map-reduce example given by the hbase website
> and
> > that is how i am following.
> >
> >
> >
> > On Mon, Aug 19, 2013 at 7:05 PM, Shahab Yunus <[email protected]
> > >wrote:
> >
> > > Can you please explain or show the flow of the code a bit more? Why are
> > you
> > > create the HTable object again and again in the mapper? Where is
> > > ContentidxTable
> > > (the name of the table, I believe?) defined? What is your actually
> > > requirement?
> > >
> > > Also, have you looked into this, the api for wiring HBase tables with
> M/R
> > > jobs?
> > > http://hbase.apache.org/book/mapreduce.example.html
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Mon, Aug 19, 2013 at 9:05 AM, Pavan Sudheendra <[email protected]
> > > >wrote:
> > >
> > > > Also, the same code works perfectly fine when i run it in single node
> > > > cluster. I've added the hbase classpath to HADOOP_CLASSPATH and have
> > set
> > > > all the other env variables also..
> > > >
> > > >
> > > > On Mon, Aug 19, 2013 at 6:33 PM, Pavan Sudheendra <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hi all,
> > > > > I'm getting the following error messages everytime i run the
> > map-reduce
> > > > > job across multiple hadoop clusters:
> > > > >
> > > > > java.lang.NullPointerException
> > > > >     at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:414)
> > > > >     at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:170)
> > > > > at com.company$AnalyzeMapper.contentidxjoin(MRjobt.java:153)
> > > > >
> > > > >
> > > > > Here's the code:
> > > > >
> > > > > public void map(ImmutableBytesWritable row, Result columns, Context
> > > > > context)
> > > > >     throws IOException {
> > > > > ...
> > > > > ...
> > > > > public static String contentidxjoin(String contentId) {
> > > > > Configuration conf = HBaseConfiguration.create();
> > > > >           HTable table;
> > > > >         try {
> > > > >             table = new HTable(conf, ContentidxTable);
> > > > >             if(table!= null) {
> > > > >             Get get1 = new Get(Bytes.toBytes(contentId));
> > > > >
> > get1.addColumn(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > > > Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > > >             Result result1 = table.get(get1);
> > > > >             byte[] val1 =
> > > > > result1.getValue(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > > >                   Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > > >             if(val1!=null) {
> > > > >                 LOGGER.info("Fetched data from BARB-Content
> table");
> > > > >             } else {
> > > > >                 LOGGER.error("Error fetching data from BARB-Content
> > > > > table");
> > > > >             }
> > > > >             return_value =
> > contentjoin(Bytes.toString(val1),contentId);
> > > > >             }
> > > > >         }
> > > > > catch (Exception e) {
> > > > >             LOGGER.error("Error inside contentidxjoin method");
> > > > >             e.printStackTrace();
> > > > >         }
> > > > >         return return_value;
> > > > > }
> > > > > }
> > > > >
> > > > > Assume all variables are defined.
> > > > >
> > > > > Can anyone please tell me why the table never gets instantiated or
> > > > > entered? I had set up break points and this function gets called
> many
> > > > times
> > > > > while mapper executes.. everytime it says *Error inside
> > contentidxjoin
> > > > > method*.. I'm 100% sure there are rows in the ContentidxTable so
> not
> > > sure
> > > > > why its not able to fetch the value from it..
> > > > >
> > > > > Please help!
> > > > >
> > > > >
> > > > > --
> > > > > Regards-
> > > > > Pavan
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards-
> > > > Pavan
> > > >
> > >
> >
> >
> >
> > --
> > Regards-
> > Pavan
> >
>

Reply via email to