Hive Generic UDF invoking Hbase

Yogesh Keshetty Tue, 29 Sep 2015 18:39:53 -0700

Hi,


I have a quick question about Hive Generic UDF’s. We are
trying to do some CRUD operations on HBase tables from hive generic UDF. But,
the issue here is until hive 0.13, it would  generate map reduce task
where we could track the status of execution. Once we migrated to hive 1.0, it
doesn’t show any status, it is probably doing a streaming on the data. How can
we know if it is using multiple mappers for the job?

 

I thought this process would be pretty fast in terms of
performance. But, looks like it is taking way longer than what we estimated.
For 11.2 million records it has been more than 8 hours still it is in progress.

 

Use Case:

 

Let us say my table name is “MemberTable”. The generic UDF
name is “Membership” which accepts n columns as the parameters to the
UDF. Inside the UDF we wrote some internal algorithm and insert the values in
to multiple hbase tables.

 

Sample Query:

 

CREATE TEMPORARY FUNCTION membership as
‘com.fishbowl.udf.membership’

 

SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;

 

 

Cluster info:

4 Node cluster (each 32 GB)

Hive version: 1.0

Hbase Version: 0.98.12

Distro: Mapr 

 
Thanks in advance!
PS: This is really urgent, I hope someone can help us asap.


Thank you,

Yogesh

Hive Generic UDF invoking Hbase

Reply via email to