You really don't want to do that. It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.
There are other options and designs but since this is for a commercial product. I'm not going to talk about them. Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well. On Oct 17, 2012, at 9:13 PM, lohit <[email protected]> wrote: > One idea is to write your own translation layer which sits in between query > and actual job submission. > You would most likely end up having your own version of hive jar which has > your translation changes on top of HIVE sources. > This has the added advantage that users need not change their queries, they > would do it as normal HIVE query, like > select * from cc_details where first_name = 'Ann' > Disadvantage is you have to maintain a fork. > > Even otherwise, my initial guess is you might have to modify command line > parser which does encrypt once instead of for every record > > 2012/10/17 Sam Mohamed <[email protected]> > Thanks for the quick response. > > The idea is that we are selling the encryption product for customers who use > HDFS. Hence, encryption is a requirement. > > Any other suggestions. > > Sam > ________________________________________ > From: Michael Segel [[email protected]] > Sent: Wednesday, October 17, 2012 6:10 PM > To: [email protected] > Subject: Re: Hive Query with UDF > > You don't need an UDF... > > You encrypt the string 'Ann' first then use that encrypted value in the > Select statement. > > That should make things a bit simpler. > > > > On Oct 17, 2012, at 8:04 PM, Sam Mohamed <[email protected]> wrote: > > > I have some encrypted data in an HDFS csv, that I've created a Hive table > > for, and I want to run a Hive query that first encrypts the query param, > > then does the lookup. I have a UDF that does encryption as follows: > > > > public class ParamEncrypt extends UDF { > > > > public Text evaluate(String name) throws Exception { > > > > String result = new String(); > > > > if (name == null) { return null; } > > > > result = ParamData.encrypt(name); > > > > return new Text(result); > > } > > } > > > > Then I run the Hive query as: > > > > select * from cc_details where first_name = encrypt('Ann'); > > > > The problem is, it's running encrypt('Ann') across every single record in > > the table. I want it do the encryption once, then do the matchup. I've > > tried: > > > > select * from cc_details where first_name in (select encrypt('Ann') from > > cc_details limit 1); > > > > But Hive doesn't support **IN** or select queries in the where clause. > > > > What can I do? > > > > Can I do something like: > > > > select encrypt('Ann') as ann from cc_details where first_name = ann; > > > > That also doesn't work because the query parser throws an error saying > > **ann** is not a known column > > > > Thanks, > > > > Sam > > > > > -- > Have a Nice Day! > Lohit
