Re: Review Request: HIVE-1634 - Allow access to Primitive types stored in binary format in HBase
On 2010-09-16 13:28:48, John Sichi wrote: trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 499 http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499 Doesn't this error message need to change? Updated the comment to ' should be mapped to Map? extends LazyPrimitive?, ?,?, that is + the Key for the map should be of primitive type, but is ... On 2010-09-16 13:28:48, John Sichi wrote: trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 623 http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623 I don't understand these TODO's. Removed/updated comment. On 2010-09-16 13:28:48, John Sichi wrote: trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 76 http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76 We keep adding new List data members. Probably time to move to a single ListColumnMapping, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy. I have changed the code to use ListColumnMapping with the fields of interest as members of this data class. On 2010-09-16 13:28:48, John Sichi wrote: trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, line 480 http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480 Why is this assertion commented out? I have removed this test. We do have coverage from the .q files for this case. This was failing due to small differences in the byte arrays from DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes. - bkm --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/#review1247 --- On 2010-10-21 20:11:06, bkm wrote: --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/ --- (Updated 2010-10-21 20:11:06) Review request for Hive Developers and John Sichi. Summary --- This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. This addresses bug HIVE-1634. http://issues.apache.org/jira/browse/HIVE-1634 Diffs - trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 1023967 trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out PRE-CREATION
[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923769#action_12923769 ] HBase Review Board commented on HIVE-1634: -- Message from: bkm.had...@gmail.com bq. On 2010-09-16 13:28:48, John Sichi wrote: bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 499 bq. http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499 bq. bq. Doesn't this error message need to change? Updated the comment to ' should be mapped to Map? extends LazyPrimitive?, ?,?, that is + the Key for the map should be of primitive type, but is ... bq. On 2010-09-16 13:28:48, John Sichi wrote: bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 623 bq. http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623 bq. bq. I don't understand these TODO's. Removed/updated comment. bq. On 2010-09-16 13:28:48, John Sichi wrote: bq. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 76 bq. http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76 bq. bq. We keep adding new List data members. Probably time to move to a single ListColumnMapping, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy. I have changed the code to use ListColumnMapping with the fields of interest as members of this data class. bq. On 2010-09-16 13:28:48, John Sichi wrote: bq. trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, line 480 bq. http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480 bq. bq. Why is this assertion commented out? I have removed this test. We do have coverage from the .q files for this case. This was failing due to small differences in the byte arrays from DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes. - bkm --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/#review1247 --- Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: Basab Maulik Assignee: Basab Maulik Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string,
[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923776#action_12923776 ] Basab Maulik commented on HIVE-1634: Re: Beyond the review comments I added, I do have some higher-level suggestions: * For the column mapping, the reason I suggested a:b:string in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. cf:cq#s. Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well. I have adopted your suggestion of '#' as the separator to the storage information and use 'hbase.columns.mapping' to carry the additional storage information optionally. I have made a small change to allow any prefix of 'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc. * I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes? I think the incompatibility stems more from trying to stay within the serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and LazyHBaseCellMap extend or depend on. It will be useful to have these two families of classes compatible (inherit from a common base class). Small differences in the object inspector classes which type parametrize these classes further complicates getting past the type system. Should be doable but perhaps as a separate patch? * For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive. Done. Added tests to create a Hive external table associated with this HBase table and test queries. * Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller. Done, the .out files are a lot smaller than in the initial patch. * Once we get this one committed, be sure to update the wiki. Will do once this is committed. Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: Basab Maulik Assignee: Basab Maulik Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping =
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923818#action_12923818 ] Alejandro Abdelnur commented on HIVE-1530: -- +1 for this change. The hive-default.xml can be provided in the distribution in a docs directory for documentation purposes for user. But the defaults used by the runtime should always come from the JAR. For log4j configuration, the JAR should include a default one, but the user should be able to provide an alternate one in the command line (like Pig). But this may be another issue. Include hive-default.xml and hive-log4j.properties in hive-common JAR - Key: HIVE-1530 URL: https://issues.apache.org/jira/browse/HIVE-1530 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: HIVE-1530.1.patch.txt hive-common-*.jar should include hive-default.xml and hive-log4j.properties, and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The hive-default.xml file that currently sits in the conf/ directory should be removed. Motivations for this change: * We explicitly tell users that they should never modify hive-default.xml yet give them the opportunity to do so by placing the file in the conf dir. * Many users are familiar with the Hadoop configuration mechanism that does not require *-default.xml files to be present in the HADOOP_CONF_DIR, and assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1575) get_json_object does not support JSON array at the root level
[ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi reassigned HIVE-1575: Assignee: Mike Lewis get_json_object does not support JSON array at the root level - Key: HIVE-1575 URL: https://issues.apache.org/jira/browse/HIVE-1575 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.7.0 Reporter: Steven Wong Assignee: Mike Lewis Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level. I have a table column of JSON arrays at the root level, but I can't parse it because of that. get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924034#action_12924034 ] Todd Lipcon commented on HIVE-842: -- Hey Pradeep. Those changes seem reasonable. I'm not personally a fan of the login user concept in Hadoop security - it's static state, which prevents servers which may want to use multiple principals from doing so easily (eg if running a hive server with an embedded metastore, you may need a different principal for the two different pieces). But given that there is no renewer thread for non-loginuser keytab logins, it may be the only choice for now. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: hive-842.txt, HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924036#action_12924036 ] Todd Lipcon commented on HIVE-1526: --- Hey John. I'm actually headed to Tokyo for the next two weeks so won't be at the contributors meeting. Perhaps Carl can look at this with you. Note that we should update the change to Thrift 0.5.0 release before committing, but the review can happen on current code. Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hive Issue Type: Task Components: Build Infrastructure, Clients Reporter: Carl Steinbach Assignee: Todd Lipcon Fix For: 0.7.0 Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, libthrift.jar Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.