Re: Review Request: HIVE-1634 - Allow access to Primitive types stored in binary format in HBase

2010-10-22 Thread bkm . hadoop


 On 2010-09-16 13:28:48, John Sichi wrote:
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
  line 499
  http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499
 
  Doesn't this error message need to change?

Updated the comment to ' should be mapped to Map? extends LazyPrimitive?, 
?,?, that is  + the Key for the map should be of primitive type, but is ... 



 On 2010-09-16 13:28:48, John Sichi wrote:
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
  line 623
  http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623
 
  I don't understand these TODO's.

Removed/updated comment.


 On 2010-09-16 13:28:48, John Sichi wrote:
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
  line 76
  http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76
 
  We keep adding new List data members.  Probably time to move to a 
  single ListColumnMapping, with a new class ColumnMapping with fields for 
  familyName, familyNameBytes, qualifierName, qualifierNameBytes, 
  familyBinary, qualifierBinary.  That will be a lot cleaner and also allow 
  you to avoid the boolean [] here, which is a little clumsy.

I have changed the code to use ListColumnMapping with the fields of interest 
as members of this data class.


 On 2010-09-16 13:28:48, John Sichi wrote:
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java,
   line 480
  http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480
 
  Why is this assertion commented out?

I have removed this test. We do have coverage from the .q files for this case. 
This was failing due to small differences in the byte arrays from 
DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.


- bkm


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
---


On 2010-10-21 20:11:06, bkm wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://review.cloudera.org/r/826/
 ---
 
 (Updated 2010-10-21 20:11:06)
 
 
 Review request for Hive Developers and John Sichi.
 
 
 Summary
 ---
 
 This addresses HIVE-1245 in part, for atomic or primitive types.
 
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 
 
 This addresses bug HIVE-1634.
 http://issues.apache.org/jira/browse/HIVE-1634
 
 
 Diffs
 -
 
   trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
 1023967 
   
 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
  1023967 
   
 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
  1023967 
   
 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  1023967 
   
 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  1023967 
   
 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
  1023967 
   trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 1023967 
   
 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
 1023967 
   
 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 1023967 
   
 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
  1023967 
   trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q 
 PRE-CREATION 
   trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q 
 PRE-CREATION 
   trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q 
 PRE-CREATION 
   
 trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
  PRE-CREATION 
   

[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-22 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923769#action_12923769
 ] 

HBase Review Board commented on HIVE-1634:
--

Message from: bkm.had...@gmail.com


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.   
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
499
bq.   http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499
bq.  
bq.   Doesn't this error message need to change?

Updated the comment to ' should be mapped to Map? extends LazyPrimitive?, 
?,?, that is  + the Key for the map should be of primitive type, but is ... 



bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.   
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
623
bq.   http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623
bq.  
bq.   I don't understand these TODO's.

Removed/updated comment.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.   
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
76
bq.   http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76
bq.  
bq.   We keep adding new List data members.  Probably time to move to a 
single ListColumnMapping, with a new class ColumnMapping with fields for 
familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, 
qualifierBinary.  That will be a lot cleaner and also allow you to avoid the 
boolean [] here, which is a little clumsy.

I have changed the code to use ListColumnMapping with the fields of interest 
as members of this data class.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.   
trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, 
line 480
bq.   http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480
bq.  
bq.   Why is this assertion commented out?

I have removed this test. We do have coverage from the .q files for this case. 
This was failing due to small differences in the byte arrays from 
DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.


- bkm


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
---





 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Basab Maulik
Assignee: Basab Maulik
 Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, 

[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-22 Thread Basab Maulik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923776#action_12923776
 ] 

Basab Maulik commented on HIVE-1634:


Re: Beyond the review comments I added, I do have some higher-level suggestions:

* For the column mapping, the reason I suggested a:b:string in the 
original JIRA description is that it's a pain to keep everything lined up by 
column position. It's already less than ideal that we do the column name 
mapping by position, so I don't think we should make it worse by having a 
separate property for type. Using the s/b shorthand is fine, and if you think 
that we shouldn't overload the colon, we can use a different separator, e.g. 
cf:cq#s. Since the existing property name is hbase.columns.mapping, I don't 
think it will be confusing to roll in the (optional) type info as well.

I have adopted your suggestion of '#' as the separator to the storage 
information and use 'hbase.columns.mapping' to carry the additional storage 
information optionally. I have made a small change to allow any prefix of 
'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc.

* I'm wondering whether we can just use the existing classes like 
LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of 
creating new ones. Or are these not compatible with hbase.utils.Bytes?

I think the incompatibility stems more from trying to stay within the 
serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and 
LazyHBaseCellMap extend or depend on. It will be useful to have these two 
families of classes compatible (inherit from a common base class). Small 
differences in the object inspector classes which type parametrize these 
classes further complicates getting past the type system. Should be doable but 
perhaps as a separate patch?

* For the tests, I noticed that you have attached 
TestHiveHBaseExternalTable. I think it would be a good idea if you can create 
and populate such a fixture table in HBaseTestSetup; that way it can be 
available (treated as read-only) to all of the HBase .q tests. Otherwise, it's 
hard to verify that we're compatible with a table created directly through 
HBase API's rather than Hive.

Done. Added tests to create a Hive external table associated with this HBase 
table and test queries.

* Also for the tests, it would be good if you can filter it down to only a 
small number of representative rows when pulling the initial test data set from 
the Hive src table. That way, we can keep the .q.out files smaller.

Done, the .out files are a lot smaller than in the initial patch.

* Once we get this one committed, be sure to update the wiki.

Will do once this is committed.


 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Basab Maulik
Assignee: Basab Maulik
 Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 

[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-10-22 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923818#action_12923818
 ] 

Alejandro Abdelnur commented on HIVE-1530:
--

+1 for this change.

The hive-default.xml can be provided in the distribution in a docs directory 
for documentation purposes for user.

But the defaults used by the runtime should always come from the JAR.

For log4j configuration, the JAR should include a default one, but the user 
should be able to provide an alternate one in the command line (like Pig). But 
this may be another issue.

 Include hive-default.xml and hive-log4j.properties in hive-common JAR
 -

 Key: HIVE-1530
 URL: https://issues.apache.org/jira/browse/HIVE-1530
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-1530.1.patch.txt


 hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
 and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
 hive-default.xml file that currently sits in the conf/ directory should be 
 removed.
 Motivations for this change:
 * We explicitly tell users that they should never modify hive-default.xml yet 
 give them the opportunity to do so by placing the file in the conf dir.
 * Many users are familiar with the Hadoop configuration mechanism that does 
 not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
 assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1575:


Assignee: Mike Lewis

 get_json_object does not support JSON array at the root level
 -

 Key: HIVE-1575
 URL: https://issues.apache.org/jira/browse/HIVE-1575
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 0.7.0
Reporter: Steven Wong
Assignee: Mike Lewis
 Attachments: 
 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch


 Currently, get_json_object(json_txt, path) always returns null if json_txt is 
 not a JSON object (e.g. is a JSON array) at the root level.
 I have a table column of JSON arrays at the root level, but I can't parse it 
 because of that.
 get_json_object should accept any JSON value (string, number, object, array, 
 true, false, null), not just object, at the root level. In other words, it 
 should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-10-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924034#action_12924034
 ] 

Todd Lipcon commented on HIVE-842:
--

Hey Pradeep. Those changes seem reasonable. I'm not personally a fan of the 
login user concept in Hadoop security - it's static state, which prevents 
servers which may want to use multiple principals from doing so easily (eg if 
running a hive server with an embedded metastore, you may need a different 
principal for the two different pieces). But given that there is no renewer 
thread for non-loginuser keytab logins, it may be the only choice for now.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: hive-842.txt, HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-10-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924036#action_12924036
 ] 

Todd Lipcon commented on HIVE-1526:
---

Hey John. I'm actually headed to Tokyo for the next two weeks so won't be at 
the contributors meeting. Perhaps Carl can look at this with you. Note that we 
should update the change to Thrift 0.5.0 release before committing, but the 
review can happen on current code.

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure, Clients
Reporter: Carl Steinbach
Assignee: Todd Lipcon
 Fix For: 0.7.0

 Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
 libthrift.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.