Re: Hive UDF for creating row key in HBASE

2017-12-18 Thread James Taylor
Hi Chethan,
As Ethan mentioned, take a look first at the Phoenix/Hive integration. If
that doesn't work for you, the best way to get the row key for a phoenix
table is to execute an UPSERT VALUES against the primary key columns
without committing it. We have a utility function that will return the
Cells that would be submitted to the server that you can use to get the row
key. You can do this through a "connectionless" JDBC Connection, so you
don't need any RPCs (including executing the CREATE TABLE call so that
Phoenix knows the metadata).

Take a look at ConnectionlessTest.testConnectionlessUpsert() for an example.

Thanks,
James

On Sun, Dec 17, 2017 at 1:19 PM, Ethan  wrote:

>
> Hi Chethan,
>
> When you write data from HDFS, are you planning to use hive to do the ETL?
> Can we do something like reading from HDFS and use Phoenix to write into to
> HBASE?
>
> There is https://phoenix.apache.org/hive_storage_handler.html, I think is
> enabling Hive to read from phoenix table, not the other way around.
>
> Thanks,
>
> On December 16, 2017 at 8:09:10 PM, Chethan Bhawarlal (
> cbhawar...@collectivei.com) wrote:
>
> Hi Dev,
>
> Currently I am planning to write data from HDFS to HBASE. And to read data
> I am using Phoenix.
>
> Phoenix is converting its primary keys separated by bytes("\x00") and
> storing it in HBASE as row key.
>
> I want to write a custom UDF in hive to create ROW KEY value of HBASE such
> that Phoenix will be able to split it into multiple columns.
>
> Following is the custom UDF code I am trying to write;
>
>
> import org.apache.hadoop.hive.ql.exec.Description;
>
> import org.apache.hadoop.hive.ql.exec.UDF;
>
> import org.apache.hadoop.hive.ql.udf.UDFType;
>
>
> @UDFType(stateful = true)
>
> @Description(name = "hbasekeygenerator", value = "_FUNC_(existing) -
> Returns a unique rowkey value for hbase")
>
> public class CIHbaseKeyGenerator extends UDF{
>
> public String evaluate(String [] args){
>
> byte zerobyte = 0x00;
>
> String zbyte = Byte.toString(zerobyte);
>
> StringBuilder sb = new StringBuilder();
>
>
> for (int i = 0; i < args.length-1;++i) {
>
> sb.append(args[i]);
>
> sb.append(zbyte);
>
>
> }
>
> sb.append(args[args.length-1]);
>
> return sb.toString();
>
> }
>
> }
>
>
> Following are my questions,
>
>
> 1.is it possible to emulate the behavior of phoenix(decoding) using hive
> custom UDF.
>
>
> 2. If it is possible, what is the better approach for this. It will be
> great if some one can share some pointers on this.
>
>
> Thanks,
>
> Chethan.
>
>
>
>
>
>
>
>
>
>
> Collective[i] dramatically improves sales and marketing performance using
> technology, applications and a revolutionary network designed to provide
> next generation analytics and decision-support directly to business users.
> Our goal is to maximize human potential and minimize mistakes. In most
> cases, the results are astounding. We cannot, however, stop emails from
> sometimes being sent to the wrong person. If you are not the intended
> recipient, please notify us by replying to this email's sender and deleting
> it (and any attachments) permanently from your system. If you are, please
> respect the confidentiality of this communication's contents.
>
>


Re: Hive UDF for creating row key in HBASE

2017-12-17 Thread Ethan

Hi Chethan,

When you write data from HDFS, are you planning to use hive to do the ETL? Can 
we do something like reading from HDFS and use Phoenix to write into to HBASE?

There is https://phoenix.apache.org/hive_storage_handler.html, I think is 
enabling Hive to read from phoenix table, not the other way around.

Thanks,

On December 16, 2017 at 8:09:10 PM, Chethan Bhawarlal 
(cbhawar...@collectivei.com) wrote:

Hi Dev,

Currently I am planning to write data from HDFS to HBASE. And to read data I am 
using Phoenix.

Phoenix is converting its primary keys separated by bytes("\x00") and storing 
it in HBASE as row key.

I want to write a custom UDF in hive to create ROW KEY value of HBASE such that 
Phoenix will be able to split it into multiple columns.

Following is the custom UDF code I am trying to write;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;

@UDFType(stateful = true)
@Description(name = "hbasekeygenerator", value = "_FUNC_(existing) - Returns a 
unique rowkey value for hbase")
public class CIHbaseKeyGenerator extends UDF{
public String evaluate(String [] args){
byte zerobyte = 0x00;
String zbyte = Byte.toString(zerobyte);
StringBuilder sb = new StringBuilder();

for (int i = 0; i < args.length-1;++i) {
sb.append(args[i]);
sb.append(zbyte);

}
sb.append(args[args.length-1]);
return sb.toString();
}
}

Following are my questions, 

1.is it possible to emulate the behavior of phoenix(decoding) using hive custom 
UDF.

2. If it is possible, what is the better approach for this. It will be great if 
some one can share some pointers on this.

Thanks,
Chethan.









Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide next 
generation analytics and decision-support directly to business users. Our goal 
is to maximize human potential and minimize mistakes. In most cases, the 
results are astounding. We cannot, however, stop emails from sometimes being 
sent to the wrong person. If you are not the intended recipient, please notify 
us by replying to this email's sender and deleting it (and any attachments) 
permanently from your system. If you are, please respect the confidentiality of 
this communication's contents.

Hive UDF for creating row key in HBASE

2017-12-16 Thread Chethan Bhawarlal
Hi Dev,

Currently I am planning to write data from HDFS to HBASE. And to read data
I am using Phoenix.

Phoenix is converting its primary keys separated by bytes("\x00") and
storing it in HBASE as row key.

I want to write a custom UDF in hive to create ROW KEY value of HBASE such
that Phoenix will be able to split it into multiple columns.

Following is the custom UDF code I am trying to write;


import org.apache.hadoop.hive.ql.exec.Description;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.hive.ql.udf.UDFType;


@UDFType(stateful = true)

@Description(name = "hbasekeygenerator", value = "_FUNC_(existing) -
Returns a unique rowkey value for hbase")

public class CIHbaseKeyGenerator extends UDF{

public String evaluate(String [] args){

byte zerobyte = 0x00;

String zbyte = Byte.toString(zerobyte);

StringBuilder sb = new StringBuilder();


for (int i = 0; i < args.length-1;++i) {

sb.append(args[i]);

sb.append(zbyte);


}

sb.append(args[args.length-1]);

return sb.toString();

}

}


Following are my questions,


1.is it possible to emulate the behavior of phoenix(decoding) using hive
custom UDF.


2. If it is possible, what is the better approach for this. It will be
great if some one can share some pointers on this.


Thanks,

Chethan.

-- 
Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.