[jira] Commented: (HIVE-1751) Optimize ColumnarStructObjectInspector.getStructFieldData()
[ https://issues.apache.org/jira/browse/HIVE-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928133#action_12928133 ] Namit Jain commented on HIVE-1751: -- +1 running tests. Optimize ColumnarStructObjectInspector.getStructFieldData() --- Key: HIVE-1751 URL: https://issues.apache.org/jira/browse/HIVE-1751 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1751.1.patch ColumnarStructObjectInspector.getStructFieldData() is a heavy used function and is expensive. By optimizing this function, including ColumnarStruct.uncheckedGetField() called by it, most queries can benefit from it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes
[ https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928140#action_12928140 ] Russell Melick commented on HIVE-1497: -- Should also update the Index wiki: http://wiki.apache.org/hadoop/Hive/IndexDev support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes -- Key: HIVE-1497 URL: https://issues.apache.org/jira/browse/HIVE-1497 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Russell Melick Fix For: 0.7.0 Attachments: HIVE-1497.4.patch, HIVE-1497.5.patch, hive-1497.p1.patch, hive-1497.p2.patch, hive-1497.p3.patch We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into account. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marquis Wang updated HIVE-1746: --- Attachment: HIVE-1746.2.patch New patch. Includes thrift generated files and should work now. Support for using ALTER to set IDXPROPERTIES Key: HIVE-1746 URL: https://issues.apache.org/jira/browse/HIVE-1746 Project: Hive Issue Type: Improvement Reporter: Marquis Wang Assignee: Marquis Wang Attachments: 1746.prelim.patch, HIVE-1746.2.patch Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to support ALTERing those properties. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join
[ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928262#action_12928262 ] Liyin Tang commented on HIVE-1754: -- This patch has some potential bugs. I will fix it today and upload a new one. Remove JDBM component from Map Join --- Key: HIVE-1754 URL: https://issues.apache.org/jira/browse/HIVE-1754 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1754.patch Right now, JDBM is the major performance bottleneck of performance. With the growth of the small table, the PUT and GET operation will take most of execution time. Map Join is designed to load the data of small table into memory. If the data is too large to hold in memory, then there is no need to use the map join strategy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible
[ https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928282#action_12928282 ] Namit Jain commented on HIVE-1750: -- Do you think it might be simpler to update opToParts list in PartitionPruner.prune() itself - reduces the possibility of bugs. It can definitely be done in a follow-up Remove Partition Filtering Conditions when Possible --- Key: HIVE-1750 URL: https://issues.apache.org/jira/browse/HIVE-1750 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-1750.1.patch, HIVE-1750.2.patch, HIVE-1750.3.patch, HIVE-1750.4.patch For some simple queries, partition filtering constraints take 8% of CPU time (now 16% since we filter twice) even if the result is always true. When possible, we should remove these constraints to save CPU times. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks
[ https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928295#action_12928295 ] John Sichi commented on HIVE-1501: -- +1. Will commit when tests pass. when generating reentrant INSERT for index rebuild, quote identifiers using backticks - Key: HIVE-1501 URL: https://issues.apache.org/jira/browse/HIVE-1501 Project: Hive Issue Type: Bug Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Skye Berghel Fix For: 0.7.0 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, HIVE-1501.4.patch Yongqiang, you mentioned that you weren't able to do this due to SORT BY not accepting them. The SORT BY is gone now as of HIVE-1494 (and SORT BY needs to be fixed anyway). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928334#action_12928334 ] John Sichi commented on HIVE-1634: -- OK, I finally got some time to look into the Lazy* classes. I see what you mean about the class hierarchy, and I agree that we can leave any refactoring of the existing classes for a followup patch. Also, I was wrong to think that we could reuse the existing binary classes, since they do things such as VInt zero-compression, and that's incompatible with the HBase Bytes format. However, for this patch, I want to at least get the new classes into their final destination with respect to package name and class name (so that we don't have to move them later, even if we adjust their inheritance). To this end, I suggest a new package serde2.lazydio, and name the classes on the pattern LazyDioInteger. The Dio is to indicate DataInput/DataOutput format. (I was thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but then I saw that Byte is also one of the datatypes, and LazyBytesByte would be puzzling.) Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, would just be too confusing. Also, regarding the implementation of the new classes, most of the init method code is duplicated from class to class. The only thing specific to each class is the actual read+set. Should we factor out a LazyDioObject (similar to the existing pattern for LazyObject and LazyBinaryObject)? Likewise for LazyDioPrimitive and LazyDioNonPrimitive. I will ask some others to chime in on this as well. Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: Basab Maulik Assignee: Basab Maulik Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,b,b,b ) tblproperties ( hbase.table.name = TestHiveHBaseExternalTable, hbase.table.default.storage.type = string); OK Time taken: 0.139 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11
[jira] Commented: (HIVE-1767) Merge files does not work with dynamic partition
[ https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928382#action_12928382 ] Namit Jain commented on HIVE-1767: -- load data local inpath '/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket20.txt' INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11); load data local inpath '/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket21.txt' INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11); load data local inpath '/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket22.txt' INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11); load data local inpath '/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket23.txt' INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11); The test needs to be updated Merge files does not work with dynamic partition Key: HIVE-1767 URL: https://issues.apache.org/jira/browse/HIVE-1767 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1767.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1767) Merge files does not work with dynamic partition
[ https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1767: --- Attachment: HIVE-1767.1.patch Merge files does not work with dynamic partition Key: HIVE-1767 URL: https://issues.apache.org/jira/browse/HIVE-1767 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-1767.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1767) Merge files does not work with dynamic partition
[ https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1767: --- Attachment: (was: HIVE-1767.1.patch) Merge files does not work with dynamic partition Key: HIVE-1767 URL: https://issues.apache.org/jira/browse/HIVE-1767 Project: Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join
[ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HIVE-1754: - Attachment: Hive-1754_2.patch Remove JDBM from Hive completely Remove JDBM component from Map Join --- Key: HIVE-1754 URL: https://issues.apache.org/jira/browse/HIVE-1754 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0, 0.7.0 Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1754.patch, Hive-1754_2.patch Right now, JDBM is the major performance bottleneck of performance. With the growth of the small table, the PUT and GET operation will take most of execution time. Map Join is designed to load the data of small table into memory. If the data is too large to hold in memory, then there is no need to use the map join strategy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1768) Update transident_lastDdlTime only if not specified
[ https://issues.apache.org/jira/browse/HIVE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1768: Attachment: HIVE-1768.1.patch Update transident_lastDdlTime only if not specified --- Key: HIVE-1768 URL: https://issues.apache.org/jira/browse/HIVE-1768 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1768.1.patch Currently, whenever a table/partition is created/altered, the field 'transient_lastDdl' time is updated with the current timestamp. For normal operations, this is the desired behavior. However for some housekeeping tasks, it may be helpful if the user could keep the existing value (or set it to something different). One example where this is useful is if a partition is copied between clusters. If the last modified time were kept same after the initial copy, it would be easy to know if one partition were overwritten/updated by comparing timestamps. This patch alters the behavior of create/alter methods in the metastore API to update the timestamp only if it is not specified in the object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Butler updated HIVE-1648: -- Attachment: HIVE-1648.patch Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Attachments: HIVE-1648.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.