[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4435: - Attachment: chart_1(1).png Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644850#comment-13644850 ] Shreepadma Venugopalan commented on HIVE-4435: -- Attached plot of relative error vs. number of distinct values after the fix. Dataset: TPC-H of varying sizes up to 10TB hive.stats.ndv.error = 5% (standard error for the estimator) Column types: String, Long, Double Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
Shreepadma Venugopalan created HIVE-4435: Summary: Column stats: Distinct value estimator should use hash functions that are pairwise independent Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4426) Support statistics collection for partitioning key
Shreepadma Venugopalan created HIVE-4426: Summary: Support statistics collection for partitioning key Key: HIVE-4426 URL: https://issues.apache.org/jira/browse/HIVE-4426 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan We should support the ability to collect statistics on the partitioning key column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4321) Add Compile/Execute support to Hive Server
[ https://issues.apache.org/jira/browse/HIVE-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627252#comment-13627252 ] Shreepadma Venugopalan commented on HIVE-4321: -- [~sarahparra]: Can you post a review request on phabricator or review board? Please remove the files that are auto generated by the thrift compiler in the review request. Thanks. Add Compile/Execute support to Hive Server -- Key: HIVE-4321 URL: https://issues.apache.org/jira/browse/HIVE-4321 Project: Hive Issue Type: Bug Components: HiveServer2, Thrift API Reporter: Sarah Parra Attachments: CompileExecute.patch Adds support for query compilation in Hive Server 2 and adds Thrift support for compile/execute APIs. This enables scenarios that need to compile a query before it is executed, e.g. and ODBC driver that implements SQLPrepare/SQLExecute. This is commonly used for a client that needs metadata for the query before it is executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4301) Bulk retrieval API for column stats
Shreepadma Venugopalan created HIVE-4301: Summary: Bulk retrieval API for column stats Key: HIVE-4301 URL: https://issues.apache.org/jira/browse/HIVE-4301 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Provide APIs to bulk fetch column stats i.e., stats for all columns in a table and stats for all columns in all partitions in a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4301) Bulk retrieval API for column stats
[ https://issues.apache.org/jira/browse/HIVE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4301: Assignee: Shreepadma Venugopalan Bulk retrieval API for column stats --- Key: HIVE-4301 URL: https://issues.apache.org/jira/browse/HIVE-4301 Project: Hive Issue Type: Bug Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Provide APIs to bulk fetch column stats i.e., stats for all columns in a table and stats for all columns in all partitions in a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615847#comment-13615847 ] Shreepadma Venugopalan commented on HIVE-4119: -- [~cwsteinbach]: Would it be possible to take a look at the new patch? Thanks. ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more
[jira] [Commented] (HIVE-4226) Cleanup non-threadsafe code in Hive
[ https://issues.apache.org/jira/browse/HIVE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612865#comment-13612865 ] Shreepadma Venugopalan commented on HIVE-4226: -- [~snarayanan]: Thank you very much for contributing this patch to the project. I've a question regarding the QHS. Does this build on the existing HiveServer or is this something you guys have built from scratch? Cleanup non-threadsafe code in Hive --- Key: HIVE-4226 URL: https://issues.apache.org/jira/browse/HIVE-4226 Project: Hive Issue Type: Improvement Reporter: Sivaramakrishnan Narayanan There is some code in Hive that is not threadsafe. These usually bubble up as problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these issues. Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded hive server at Qubole (QHS) which is running in production for a couple of months now. As part of this effort, we've fixed a number of instances of non-threadsafe code. I'm looking to contribute this back to the community. Note that there is no new functionality here - just some better hygiene. If there are any stress tests that have revealed hive server bugs in the past, it will be great if they can be added to the jira. Also, this is my first attempt at contributing to Apache, so please forgive any mistakes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4226) Cleanup non-threadsafe code in Hive
[ https://issues.apache.org/jira/browse/HIVE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612870#comment-13612870 ] Shreepadma Venugopalan commented on HIVE-4226: -- HIVE-4141, HIVE-4075 are relevant and were recently fixed. Cleanup non-threadsafe code in Hive --- Key: HIVE-4226 URL: https://issues.apache.org/jira/browse/HIVE-4226 Project: Hive Issue Type: Improvement Reporter: Sivaramakrishnan Narayanan There is some code in Hive that is not threadsafe. These usually bubble up as problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these issues. Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded hive server at Qubole (QHS) which is running in production for a couple of months now. As part of this effort, we've fixed a number of instances of non-threadsafe code. I'm looking to contribute this back to the community. Note that there is no new functionality here - just some better hygiene. If there are any stress tests that have revealed hive server bugs in the past, it will be great if they can be added to the jira. Also, this is my first attempt at contributing to Apache, so please forgive any mistakes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4119: - Attachment: HIVE-4119.2.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at
[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611515#comment-13611515 ] Shreepadma Venugopalan commented on HIVE-4119: -- New patch addresses the review comments. Thanks. ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException:
[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4119: - Status: Patch Available (was: Open) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at
[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-4119: - Attachment: HIVE-4119.1.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical Attachments: HIVE-4119.1.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602713#comment-13602713 ] Shreepadma Venugopalan commented on HIVE-4119: -- Review request: https://reviews.apache.org/r/9929/ ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical Attachments: HIVE-4119.1.patch ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
[jira] [Created] (HIVE-4153) Use number of distinct values to decide whether to perform map side aggregation
Shreepadma Venugopalan created HIVE-4153: Summary: Use number of distinct values to decide whether to perform map side aggregation Key: HIVE-4153 URL: https://issues.apache.org/jira/browse/HIVE-4153 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0, 0.9.0, 0.8.1, 0.8.0 Reporter: Shreepadma Venugopalan Today, Hive decides to perform a map side aggregation by default. If the number of unique keys in the aggregation is small, performing a map side aggregation is beneficial. However, if the number of keys is sufficiently large, it can lead to OOMEs. Upon encountering an OOME, hive.map.aggr has be set to false to turn it off. Instead, we can use the number of distinct values in the group by column along with the number of rows in the table to decide if map side aggregation should be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty
[ https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4119: Assignee: Shreepadma Venugopalan ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty - Key: HIVE-4119 URL: https://issues.apache.org/jira/browse/HIVE-4119 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Priority: Critical ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty {code} hive -e create table empty_table (i int); select compute_stats(i, 16) from empty_table java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099) ... 15 more org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132) at
[jira] [Assigned] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name
[ https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-4118: Assignee: Shreepadma Venugopalan ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name Key: HIVE-4118 URL: https://issues.apache.org/jira/browse/HIVE-4118 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Lenni Kuff Assignee: Shreepadma Venugopalan Computing column stats fails when using fully qualified table name. Issuing a USE db and using only the table name succeeds. {code} hive -e ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS int_col org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Table somedb.some_table for which stats is gathered doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111) at $Proxy9.updateTableColumnStatistics(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy10.update_table_column_statistics(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74) at $Proxy11.updateTableColumnStatistics(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198) ... 18 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588639#comment-13588639 ] Shreepadma Venugopalan commented on HIVE-4064: -- I believe there is a problem with a number of DDLs including ALTER TABLE, CREATE INDEX. Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
Shreepadma Venugopalan created HIVE-4064: Summary: Handle db qualified names consistently across all HiveQL statements Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name
[ https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578705#comment-13578705 ] Shreepadma Venugopalan commented on HIVE-4021: -- Looks good, +1. PostgreSQL upgrade scripts are creating column with incorrect name -- Key: HIVE-4021 URL: https://issues.apache.org/jira/browse/HIVE-4021 Project: Hive Issue Type: Bug Reporter: Jarek Jarcec Cecho Priority: Trivial Attachments: bugHIVE-4021.patch I've noticed that PostgreSQL upgrade scripts are creating table {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, however hive (and all other scripts) are expecting column name {{DOUBLE_HIGH_VALUE}} (without the S at the end). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573991#comment-13573991 ] Shreepadma Venugopalan commented on HIVE-3179: -- +1. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3994) Hive metastore is not working on PostgreSQL 9.2 (most likely on anything 9.0+)
[ https://issues.apache.org/jira/browse/HIVE-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574113#comment-13574113 ] Shreepadma Venugopalan commented on HIVE-3994: -- This problem appears in postgres 9.x because standard conforming strings were turned on by default starting 9.x. More here - http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1#Backward_compatibility_issues. One fix for this issue is to set standard_conforming_string to off when setting up hive metastore on postgres. Hive metastore is not working on PostgreSQL 9.2 (most likely on anything 9.0+) -- Key: HIVE-3994 URL: https://issues.apache.org/jira/browse/HIVE-3994 Project: Hive Issue Type: Improvement Reporter: Jarek Jarcec Cecho I'm getting following exception when running metastore on PostgreSQL 9.2: {code} Caused by: javax.jdo.JDODataStoreException: Error executing JDOQL query SELECT THIS.TBL_NAME AS NUCORDER0 FROM TBLS THIS LEFT OUTER JOIN DBS THIS_DATABASE_NAME ON THIS.DB_ID = THIS_DATABASE_NAME.DB_ID WHERE THIS_DATABASE_NAME.NAME = ? AND (LOWER(THIS.TBL_NAME) LIKE ? ESCAPE '\\' ) ORDER BY NUCORDER0 : ERROR: invalid escape string Hint: Escape string must be empty or one character.. NestedThrowables: org.postgresql.util.PSQLException: ERROR: invalid escape string Hint: Escape string must be empty or one character. at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:252) at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:759) ... 28 more Caused by: org.postgresql.util.PSQLException: ERROR: invalid escape string Hint: Escape string must be empty or one character. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:271) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:457) at org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:123) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288) at org.datanucleus.store.query.Query.executeQuery(Query.java:1657) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243) ... 29 more {code} I've google a bit about that and I found a lot of similar issues in different projects thus I'm assuming that this might be some backward compatibility issue on PostgreSQL side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4001) Add o.a.h.h.serde.Constants for backward compatibility
[ https://issues.apache.org/jira/browse/HIVE-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574175#comment-13574175 ] Shreepadma Venugopalan commented on HIVE-4001: -- Looks good. +1. Add o.a.h.h.serde.Constants for backward compatibility -- Key: HIVE-4001 URL: https://issues.apache.org/jira/browse/HIVE-4001 Project: Hive Issue Type: Improvement Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4001.D8457.1.patch It's renamed to 'serdeConstants' in hive-0.10.0. But the class can be referenced by all of the custom implementations including UDFs, Serdes, StorageHandlers, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3917) Support fast operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564856#comment-13564856 ] Shreepadma Venugopalan commented on HIVE-3917: -- I assume there this will allow gathering some statistics namely number of files, size in bytes when the data storage is HDFS. Is there a plan to support 'noscan' for other statistics such as number of rows, stats on columns such as top k etc? If not, is there a plan to deal with some stats, namely the ones that can't be gathered through noscan, being stale? Support fast operation for analyze command -- Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3917) Support fast operation for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564888#comment-13564888 ] Shreepadma Venugopalan commented on HIVE-3917: -- [~gangtimliu]: Thanks for the clarification. If we add a flag to indicate stats is stale, how will we distinguish between the case when the stats is really stale vs the case when some stats have been updated from a noscan operation? Support fast operation for analyze command -- Key: HIVE-3917 URL: https://issues.apache.org/jira/browse/HIVE-3917 Project: Hive Issue Type: Improvement Components: Statistics Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Gang Tim Liu Attachments: HIVE-3917.patch.1 hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables It collects: 1. Number of Rows 2. Number of files 3. Size in Bytes If table/partition is big, the operation would take time since it will open all files and scan all data. It would be nice to support fast operation to gather statistics which doesn't require to open all files: 1. Number of files 2. Size in Bytes Potential syntax is ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan]; In the future, all statistics without scan can be retrieved via this optional parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3931) Add Oracle metastore upgrade script for 0.9 to 10.0
[ https://issues.apache.org/jira/browse/HIVE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561068#comment-13561068 ] Shreepadma Venugopalan commented on HIVE-3931: -- Looks good. Non-committer +1. Add Oracle metastore upgrade script for 0.9 to 10.0 --- Key: HIVE-3931 URL: https://issues.apache.org/jira/browse/HIVE-3931 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.11.0 Attachments: HIVE-3931-1.patch The top level Oracle metastore upgrade script for 0.9 to 0.10 is missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) Optimizer statistics on columns in tables and partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Summary: Optimizer statistics on columns in tables and partitions (was: Column level scalar valued statistics on Tables and Partitions) Optimizer statistics on columns in tables and partitions Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-33) [Hive]: Add optimizer statistics in Hive
[ https://issues.apache.org/jira/browse/HIVE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-33: --- Summary: [Hive]: Add optimizer statistics in Hive (was: [Hive]: Add ability to compute statistics on hive tables) [Hive]: Add optimizer statistics in Hive Key: HIVE-33 URL: https://issues.apache.org/jira/browse/HIVE-33 Project: Hive Issue Type: New Feature Components: Query Processor, Statistics Reporter: Ashish Thusoo Labels: statistics Add commands to collect partition and column level statistics in hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1940) Query Optimization Using Column Statistics and Histograms
[ https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1940: - Summary: Query Optimization Using Column Statistics and Histograms (was: Query Optimization Using Column Metadata and Histograms) Query Optimization Using Column Statistics and Histograms - Key: HIVE-1940 URL: https://issues.apache.org/jira/browse/HIVE-1940 Project: Hive Issue Type: New Feature Components: Metastore, Query Processor, Statistics Reporter: Anja Gruenheid Attachments: Agruenheid_ideas11.pdf, HiveMetaStore.pdf The current basis for cost-based query optimization in Hive is information gathered on tables and partitions. To make further improvements in query optimization possible, the next step is to develop and implement possibilities to gather information on columns as discussed in issue HIVE-33. After that, an implementation of histograms is a possible option to use and collect run-time statistics. Next to the actual implementation of these features, it is also necessary to develop a consistent storage model for the MetaStore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554243#comment-13554243 ] Shreepadma Venugopalan commented on HIVE-3004: -- Thanks Ashutosh! RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Fix For: 0.11.0 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt, HIVE-3004.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Status: Open (was: Patch Available) RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Status: Patch Available (was: Open) RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt, HIVE-3004.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Attachment: HIVE-3004.4.patch RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt, HIVE-3004.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
[ https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3886: - Status: Patch Available (was: Open) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated - Key: HIVE-3886 URL: https://issues.apache.org/jira/browse/HIVE-3886 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Minor Attachments: HIVE-3886.1.patch WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
[ https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3886: - Attachment: HIVE-3886.1.patch WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated - Key: HIVE-3886 URL: https://issues.apache.org/jira/browse/HIVE-3886 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Minor Attachments: HIVE-3886.1.patch WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Status: Patch Available (was: Open) RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Attachment: HIVE-3004.3.patch.txt RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551697#comment-13551697 ] Shreepadma Venugopalan commented on HIVE-3004: -- Thanks Ashutosh. I've attached the new patch to the JIRA. RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551698#comment-13551698 ] Shreepadma Venugopalan commented on HIVE-3004: -- Review board : https://reviews.apache.org/r/8931/ RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, HIVE-3004.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3426) union with same source should be optimized
[ https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551701#comment-13551701 ] Shreepadma Venugopalan commented on HIVE-3426: -- Yup, let's try to optimize the simple case first. Optimizing subqueries with GBY can be the next step. union with same source should be optimized -- Key: HIVE-3426 URL: https://issues.apache.org/jira/browse/HIVE-3426 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Zhenxiao Luo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3653) Failure in a counter poller run should not be considered as a job failure
[ https://issues.apache.org/jira/browse/HIVE-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-3653: Assignee: Shreepadma Venugopalan Failure in a counter poller run should not be considered as a job failure - Key: HIVE-3653 URL: https://issues.apache.org/jira/browse/HIVE-3653 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.7.1 Reporter: Harsh J Assignee: Shreepadma Venugopalan A client had a simple transient failure in polling the JT for job status (which it does for HIVECOUNTERSPULLINTERVAL for each currently running job). {code} java.io.IOException: Call to HOST/IP:PORT failed on local exception: java.io.IOException: Connection reset by peer at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1110) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {code} This lead to Hive thinking the running job itself has failed, and it failed the query run, although the running job progressed to completion in the background. We should not let transient IOExceptions in counter polling cause query termination, and should instead just retry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
[ https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550693#comment-13550693 ] Shreepadma Venugopalan commented on HIVE-3875: -- Thanks Carl for committing. negative value for hive.stats.ndv.error should be disallowed - Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.11.0 Attachments: HIVE-3875.1.patch.txt Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
Shreepadma Venugopalan created HIVE-3886: Summary: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated Key: HIVE-3886 URL: https://issues.apache.org/jira/browse/HIVE-3886 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Minor WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3887) Upgrade Hive's Avro dependency to version 1.7.3
Shreepadma Venugopalan created HIVE-3887: Summary: Upgrade Hive's Avro dependency to version 1.7.3 Key: HIVE-3887 URL: https://issues.apache.org/jira/browse/HIVE-3887 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
Shreepadma Venugopalan created HIVE-3875: Summary: negative value for hive.stats.ndv.error should be disallowed Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: CDH-9733.1.patch.txt Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
[ https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3875: - Status: Patch Available (was: Open) negative value for hive.stats.ndv.error should be disallowed - Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: CDH-9733.1.patch.txt Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
[ https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3875: - Attachment: CDH-9733.1.patch.txt negative value for hive.stats.ndv.error should be disallowed - Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: CDH-9733.1.patch.txt Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
[ https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3875: - Attachment: (was: CDH-9733.1.patch.txt) negative value for hive.stats.ndv.error should be disallowed - Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed
[ https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3875: - Attachment: HIVE-3875.1.patch.txt negative value for hive.stats.ndv.error should be disallowed - Key: HIVE-3875 URL: https://issues.apache.org/jira/browse/HIVE-3875 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-3875.1.patch.txt Currently, if a negative value is specified for hive.stats.ndv.error in hive-site.xml, it is treated as 0. We should instead throw an exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING
[ https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3004: - Status: Open (was: Patch Available) I'm working on rebasing the patch off of the trunk. RegexSerDe should support other column types in addition to STRING -- Key: HIVE-3004 URL: https://issues.apache.org/jira/browse/HIVE-3004 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Carl Steinbach Assignee: Shreepadma Venugopalan Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3877) Implement equi-depth histograms as a UDAF
Shreepadma Venugopalan created HIVE-3877: Summary: Implement equi-depth histograms as a UDAF Key: HIVE-3877 URL: https://issues.apache.org/jira/browse/HIVE-3877 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Implement a space and time efficient algorithm to bin numeric column data such that all bins approximately contain the same number of elements. Implement such an algorithm as a generic UDAF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3878) Enhance the existing thrift APIs to persist the histogram
Shreepadma Venugopalan created HIVE-3878: Summary: Enhance the existing thrift APIs to persist the histogram Key: HIVE-3878 URL: https://issues.apache.org/jira/browse/HIVE-3878 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Enhance the existing thrift APIs added for column statistics to persist histograms in addition to the scalar stats value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3881) Extend the analyze table syntax to allow the user to request computing histogram
Shreepadma Venugopalan created HIVE-3881: Summary: Extend the analyze table syntax to allow the user to request computing histogram Key: HIVE-3881 URL: https://issues.apache.org/jira/browse/HIVE-3881 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Since computing histograms can be expensive, by default only scalar statistics on columns will be gathered when an analyze table .. compute statistics for columns ... is executed. This JIRA covers the task of extending the analyze table to allow the user to specify computing histogram in addition to other statistics on columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3879) Enhance the existing thrift APIs to retrieve the histogram corresponding to a column
[ https://issues.apache.org/jira/browse/HIVE-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-3879: Assignee: Shreepadma Venugopalan Enhance the existing thrift APIs to retrieve the histogram corresponding to a column Key: HIVE-3879 URL: https://issues.apache.org/jira/browse/HIVE-3879 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Enhance the existing thrift API to retrieve the histogram, if it exists, corresponding to a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3878) Enhance the existing thrift APIs to persist the histogram
[ https://issues.apache.org/jira/browse/HIVE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-3878: Assignee: Shreepadma Venugopalan Enhance the existing thrift APIs to persist the histogram -- Key: HIVE-3878 URL: https://issues.apache.org/jira/browse/HIVE-3878 Project: Hive Issue Type: Sub-task Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Enhance the existing thrift APIs added for column statistics to persist histograms in addition to the scalar stats value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549151#comment-13549151 ] Shreepadma Venugopalan commented on HIVE-3286: -- HIVE-3526 covers the task of computing and persisting histograms on numeric columns in Hive tables and partitions. Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver alias are
[jira] [Commented] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509300#comment-13509300 ] Shreepadma Venugopalan commented on HIVE-3764: -- I think adding the consistency check is a good idea too. I've not looked into all the details of the code, but I noticed that the metastore version number is the hive release version. While this makes the version numbers easily readable, we would need to provide scripts and perform a metastore upgrade on every Hive release even if there are no other patches in the release that require a metastore schema upgrade. The other option would be to use version numbers from a monotonically increasing sequence instead and bump up the version number only if there are changes in a release that require a metastore upgrade. Wondering if you have considered the later option. Thanks. Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.10.0 Attachments: HIVE-3764-1.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3764) Support metastore version consistency check
[ https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509305#comment-13509305 ] Shreepadma Venugopalan commented on HIVE-3764: -- Irrespective of which option we choose to generate version numbers, we should not execute the insert/update version number statement in the schema creation/upgrade script until all other statements in the schema creation/upgrade script have completed without errors. Thanks. Support metastore version consistency check --- Key: HIVE-3764 URL: https://issues.apache.org/jira/browse/HIVE-3764 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.10.0 Attachments: HIVE-3764-1.patch Today there's no version/compatibility information stored in hive metastore. Also the datanucleus configuration property to automatically create missing tables is enabled by default. If you happen to start an older or newer hive or don't run the correct upgrade scripts during migration, the metastore would end up corrupted. The autoCreate schema is not always sufficient to upgrade metastore when migrating to newer release. It's not supported with all databases. Besides the migration often involves altering existing table, changing or moving data etc. Hence it's very useful to have some consistency check to make sure that hive is using correct metastore and for production systems the schema is not automatically by running hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3747) Provide hive operation name for hookContext
[ https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508481#comment-13508481 ] Shreepadma Venugopalan commented on HIVE-3747: -- Thanks Namit for creating a review request. Will do so in the future for other reviews. Provide hive operation name for hookContext --- Key: HIVE-3747 URL: https://issues.apache.org/jira/browse/HIVE-3747 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sudhanshu Arora Assignee: Shreepadma Venugopalan Attachments: HIVE-3747.1.patch.txt The hookContext exposed through ExecuteWithHookContext, does not provide the name of the Hive operation. The following public API should be added in HookContext. public String getOperationName() { return SessionState.get().getHiveOperation().name(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3747) Provide hive operation name for hookContext
[ https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan reassigned HIVE-3747: Assignee: Shreepadma Venugopalan Provide hive operation name for hookContext --- Key: HIVE-3747 URL: https://issues.apache.org/jira/browse/HIVE-3747 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sudhanshu Arora Assignee: Shreepadma Venugopalan The hookContext exposed through ExecuteWithHookContext, does not provide the name of the Hive operation. The following public API should be added in HookContext. public String getOperationName() { return SessionState.get().getHiveOperation().name(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3747) Provide hive operation name for hookContext
[ https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3747: - Status: Patch Available (was: Open) Provide hive operation name for hookContext --- Key: HIVE-3747 URL: https://issues.apache.org/jira/browse/HIVE-3747 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sudhanshu Arora Assignee: Shreepadma Venugopalan Attachments: HIVE-3747.1.patch.txt The hookContext exposed through ExecuteWithHookContext, does not provide the name of the Hive operation. The following public API should be added in HookContext. public String getOperationName() { return SessionState.get().getHiveOperation().name(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3747) Provide hive operation name for hookContext
[ https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3747: - Attachment: HIVE-3747.1.patch.txt Provide hive operation name for hookContext --- Key: HIVE-3747 URL: https://issues.apache.org/jira/browse/HIVE-3747 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sudhanshu Arora Assignee: Shreepadma Venugopalan Attachments: HIVE-3747.1.patch.txt The hookContext exposed through ExecuteWithHookContext, does not provide the name of the Hive operation. The following public API should be added in HookContext. public String getOperationName() { return SessionState.get().getHiveOperation().name(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive
[ https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507831#comment-13507831 ] Shreepadma Venugopalan commented on HIVE-3720: -- @Namit: The authorization model in this proposal mirrors that of MySQL as closely as possible. The proposal also documents wherever there is a deviation from MySQL's authorization model. Since Hive's data model is based on that of MySQL, it would make a lot of sense to base the authorization model on MySQL's as well. The proposed functionality is not necessarily a superset of the existing authorization functionality but subsumes some of the existing functionality. While the existing implementation supports authorization on some HiveQL operations, it doesn't secure all of the operations, provide a way to bootstrap the system etc. This proposal expands authorization to all HiveQL operations and direct metadata operations that can be performed by invoking the metastore Thrift API. As discussed earlier, since the proposed model standardizes the authorization model to mirror that of MySQL, it deviates from the existing model where ever the existing implementation deviates from the authorization model of MySQL or other RDBMSs. The proposed model is also more fine grained and supports hierarchical privileges much like an RDBMS. For instance, the proposed model supports CREATE, ALTER, DROP privileges on objects whereas the current model supports an ALTER_METADATA privilege that includes the privileges needed to perform CREATE, ALTER, DROP etc. Note that one of the goals is to propose an authorization model such that finer grained privileges can be added in as necessary later. Since the existing implementation is not complete, it unclear at this point what part of the functionality has been completely implemented. Perhaps we can mark the existing functionality in the wiki once we start implementing the proposed model. Thanks. Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: Hive_Authorization_Functionality.pdf The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Attachment: HIVE-3678.4.patch.txt Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504852#comment-13504852 ] Shreepadma Venugopalan commented on HIVE-3678: -- Uploaded patch rebased off tip of trunk. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504337#comment-13504337 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Ashutosh: I've uploaded a new patch which adds 2 varchar columns for storing BigDecimal low and high values. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Attachment: HIVE-3678.3.patch.txt Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504338#comment-13504338 ] Shreepadma Venugopalan commented on HIVE-3678: -- Updated patch is available on both JIRA and RB. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503531#comment-13503531 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Ashutosh: If store long/double types as a varchar instead of storing it as a numeric type, we can avoid evolving the schema when we add a BigDecimal type. That's the only benefit I see for storing long/double as a varchar. However, I agree with you that we should avoid untyping data when possible. Let me know your thoughts. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503425#comment-13503425 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Ashutosh: Thanks for your comments. Do you think it makes sense to store numeric long/double/bigdecimal values in a varchar column? I don't see consistent BLOB/CLOB support across DB vendors and versions. If you agree, I'll make the change to store these numeric values in a varchar column and post a new patch. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502440#comment-13502440 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Ashutosh: My answers are inline. We can add two more column in M*ColumnStatistics table of type BigDecimal: BigDecimalLowValue and BigDecimalHighValue. But is BigDecimal type supported consistently across different DBs? Agreed, BigDecimal is not consistently supported across DBs. Hence we can't add a BigDecimal column consistently across DB vendors and versions easily. We can have these two columns of type Double, but then we loose precision. Yes, we can store BigDecimal and Long as Double but we will lose precision. We can store as plain strings in column of type varchar. The maximum number of digits after the decimal point in a BigDecimal number is unlimited for all practical purpose. If we stored it in a varchar, it could result in truncation of some digits following the decimal point in some cases, but this seems to be the only practical solution. We can store in json format in column of type varchar. The maximum number of digits in a BigDecimal number after the decimal point is unlimited for all practical purposes (Java allows nearly 2 billion digits after the decimal point). At this time, we collect MIN, MAX column values for numeric columns. If we stored BigDecimal value, we may exceed the varchar size limit and as a result truncate the JSON blob. This would result in a malformed JSON object. Additionally we will also lose some of the column statistics. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500526#comment-13500526 ] Shreepadma Venugopalan commented on HIVE-3678: -- With the changes from HIVE-3712, the column schema has *no* dependency on any specific db. The column schema, with the changes from HIVE-3712, uses simple data types, which are supported across DBs. The primary motivation for making the change to the schema in HIVE-3712 was to avoid storing column statistics fields as a BLOB. The problem with using a BLOB is a) BLOBs are designed to store large volumes of data in the order of GBs and are hence stored outside the row. A consequence of this design is BLOBs don't perform well for storing small amounts of data. While some DBs such as Oracle inline small BLOBs, all DBs don't. While BLOBs are the only practical choice for storing data whose size is not known in advance, it is an overkill for storing around 100 bytes of data, and b) there is no uniform support across DB vendors and versions. Hence I don't really see the value in storing this as a JSON BLOB. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Status: Patch Available (was: Open) Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Attachment: HIVE-3678.1.patch.txt Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema
[ https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499924#comment-13499924 ] Shreepadma Venugopalan commented on HIVE-3712: -- It looks like VARBINARY is not supported across different DBs and DB versions in a consistent manner. Storing 8 bytes in a LONGVARBINARY is an overkill because the LONGVARBINARY is mapped to BLOB type in some DBs. It appears the best solution at this point is to store LONG and DOUBLE min and max values in two separate columns. Use varbinary instead of longvarbinary to store min and max column values in column stats schema Key: HIVE-3712 URL: https://issues.apache.org/jira/browse/HIVE-3712 Project: Hive Issue Type: Bug Components: Metastore, Statistics Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min and max column values for numeric types takes up 8 bytes and hence doesn't require a BLOB. Storing these values in a BLOB will impact performance without providing much benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema
[ https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3712: - Status: Patch Available (was: Open) Use varbinary instead of longvarbinary to store min and max column values in column stats schema Key: HIVE-3712 URL: https://issues.apache.org/jira/browse/HIVE-3712 Project: Hive Issue Type: Bug Components: Metastore, Statistics Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min and max column values for numeric types takes up 8 bytes and hence doesn't require a BLOB. Storing these values in a BLOB will impact performance without providing much benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499930#comment-13499930 ] Shreepadma Venugopalan commented on HIVE-3678: -- Review board link: https://reviews.apache.org/r/8119/ Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema
[ https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499931#comment-13499931 ] Shreepadma Venugopalan commented on HIVE-3712: -- Review board link: https://reviews.apache.org/r/8119/ Use varbinary instead of longvarbinary to store min and max column values in column stats schema Key: HIVE-3712 URL: https://issues.apache.org/jira/browse/HIVE-3712 Project: Hive Issue Type: Bug Components: Metastore, Statistics Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min and max column values for numeric types takes up 8 bytes and hence doesn't require a BLOB. Storing these values in a BLOB will impact performance without providing much benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3720) Expand and standardize authorization in Hive
Shreepadma Venugopalan created HIVE-3720: Summary: Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive
[ https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3720: - Attachment: Hive_Authorization_Functionality.pdf Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: Hive_Authorization_Functionality.pdf The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive
[ https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1350#comment-1350 ] Shreepadma Venugopalan commented on HIVE-3720: -- Attached document outlines the authorization model and its semantics. Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: Hive_Authorization_Functionality.pdf The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema
Shreepadma Venugopalan created HIVE-3712: Summary: Use varbinary instead of longvarbinary to store min and max column values in column stats schema Key: HIVE-3712 URL: https://issues.apache.org/jira/browse/HIVE-3712 Project: Hive Issue Type: Bug Components: Metastore, Statistics Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min and max column values for numeric types takes up 8 bytes and hence doesn't require a BLOB. Storing these values in a BLOB will impact performance without providing much benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3705) Adding authorization capability to the metastore
[ https://issues.apache.org/jira/browse/HIVE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498520#comment-13498520 ] Shreepadma Venugopalan commented on HIVE-3705: -- @Sushanth: Thanks for posting the document and the patch. Securing the metastore is necessary to provide reliable authorization in Hive. I looked at the document and the code and have the following high level questions, a)The document contains an example of how the current pluggable authorization provider can be exploited to circumvent security. This patch seems to introduce a new config param - hive.security.metastore.authorization.manager - that allows a pluggable authorization provider. Perhaps I'm missing something here, but wondering how we would prevent a user from plugging in their own authorization provider. b)The current Hive authorization model exposes semantics that is confusing and at times inconsistent. While this patch has moved the auth checks to the metastore (IMO, this is the right thing to do) it seems to implement the existing semantics. Wondering if there is a plan to fix the semantics at some point. c)How do we obtain the userid for performing authorization? Are we using the authentication id from the Thrift context? If so, how do we handle the case where the authentication id is different from the authorization id, for e.g., HS2 authenticates to the metastore as HS2 but is executing a statement on behalf of user 'u1'? Thanks. Adding authorization capability to the metastore Key: HIVE-3705 URL: https://issues.apache.org/jira/browse/HIVE-3705 Project: Hive Issue Type: New Feature Components: Authorization, Metastore Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-3705.D6681.1.patch, HIVE-3705.D6681.2.patch, hivesec_investigation.pdf In an environment where multiple clients access a single metastore, and we want to evolve hive security to a point where it's no longer simply preventing users from shooting their own foot, we need to be able to authorize metastore calls as well, instead of simply performing every metastore api call that's made. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3706) getBoolVar in FileSinkOperator can be optimized
[ https://issues.apache.org/jira/browse/HIVE-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496447#comment-13496447 ] Shreepadma Venugopalan commented on HIVE-3706: -- Looks good. Non committer +1. getBoolVar in FileSinkOperator can be optimized --- Key: HIVE-3706 URL: https://issues.apache.org/jira/browse/HIVE-3706 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3706.1.patch.txt There's a call to HiveConf.getBoolVar in FileSinkOperator's processOp method. In benchmarks we found this call to be using ~2% of the CPU time on simple queries, e.g. INSERT OVERWRITE TABLE t1 SELECT * FROM t2; This boolean value, a flag to collect the RawDataSize stat, won't change during the processing of a query, so we can determine it at initialization and store that value, saving that CPU. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3706) getBoolVar in FileSinkOperator can be optimized
[ https://issues.apache.org/jira/browse/HIVE-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496451#comment-13496451 ] Shreepadma Venugopalan commented on HIVE-3706: -- @Kevin: We should see if there are other opportunities to move such checks from execution to operator initialization. getBoolVar in FileSinkOperator can be optimized --- Key: HIVE-3706 URL: https://issues.apache.org/jira/browse/HIVE-3706 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-3706.1.patch.txt There's a call to HiveConf.getBoolVar in FileSinkOperator's processOp method. In benchmarks we found this call to be using ~2% of the CPU time on simple queries, e.g. INSERT OVERWRITE TABLE t1 SELECT * FROM t2; This boolean value, a flag to collect the RawDataSize stat, won't change during the processing of a query, so we can determine it at initialization and store that value, saving that CPU. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493587#comment-13493587 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Tim: I'm currently working on providing the upgrade scripts for different databases. Since there is a plan to release Hive 0.10 soon, we have to provide upgrade scripts for all of them. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492495#comment-13492495 ] Shreepadma Venugopalan commented on HIVE-1362: -- HIVE-3524 changed the signature of endFunction in HiveMetaStore.java. HIVE-3524 was committed hours before this patch. The compile errors are due to the signature change. I'm working on a fix. Thanks. Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492498#comment-13492498 ] Shreepadma Venugopalan commented on HIVE-1362: -- @Namit: Not sure what the protocol is but I've attach the new patch to this JIRA. Thanks. Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Attachment: HIVE-1362.11.patch.txt Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492505#comment-13492505 ] Shreepadma Venugopalan commented on HIVE-1362: -- Please look at HIVE-1362.11.patch.txt to fix the compile errors introduced earlier. Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
Shreepadma Venugopalan created HIVE-3686: Summary: Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524 Key: HIVE-3686 URL: https://issues.apache.org/jira/browse/HIVE-3686 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Blocker HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was committed some hours before HIVE-1362. The change in signature broke the build after HIVE-1362 which still contained the old signature was committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
[ https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3686: - Status: Patch Available (was: Open) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524 --- Key: HIVE-3686 URL: https://issues.apache.org/jira/browse/HIVE-3686 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Blocker Attachments: HIVE-1362.11.patch.txt HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was committed some hours before HIVE-1362. The change in signature broke the build after HIVE-1362 which still contained the old signature was committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
[ https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3686: - Attachment: HIVE-1362.11.patch.txt Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524 --- Key: HIVE-3686 URL: https://issues.apache.org/jira/browse/HIVE-3686 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Blocker Attachments: HIVE-1362.11.patch.txt HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was committed some hours before HIVE-1362. The change in signature broke the build after HIVE-1362 which still contained the old signature was committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492524#comment-13492524 ] Shreepadma Venugopalan commented on HIVE-1362: -- Filed a new JIRA - HIVE-3686 to fix the compile errors. Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
[ https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492711#comment-13492711 ] Shreepadma Venugopalan commented on HIVE-3686: -- Thanks Kevin. Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524 --- Key: HIVE-3686 URL: https://issues.apache.org/jira/browse/HIVE-3686 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Blocker Fix For: 0.10.0 Attachments: HIVE-1362.11.patch.txt HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was committed some hours before HIVE-1362. The change in signature broke the build after HIVE-1362 which still contained the old signature was committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3689) Update website with info on how to report security bugs
[ https://issues.apache.org/jira/browse/HIVE-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492929#comment-13492929 ] Shreepadma Venugopalan commented on HIVE-3689: -- @Eli: In Hadoop land, who are the people with read access to the list i.e., the ones who can view the security vulnerabilities? Currently, all hive security issue seem to be in public domain on JIRA. Update website with info on how to report security bugs Key: HIVE-3689 URL: https://issues.apache.org/jira/browse/HIVE-3689 Project: Hive Issue Type: Task Components: Documentation Reporter: Eli Collins The Hive website should be updated with information on how to report potential security vulnerabilities. In Hadoop land we have a private security list that anyone case post to that we point to on our list page: Hadoop example http://hadoop.apache.org/general_lists.html#Security. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491764#comment-13491764 ] Shreepadma Venugopalan commented on HIVE-1362: -- @Carl: Please take the latest patch from JIRA. If you have trouble applying it, let me know. Thanks. column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1362) column level statistics
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491792#comment-13491792 ] Shreepadma Venugopalan commented on HIVE-1362: -- @Carl: You will see 6 failures in testParse (groupby1.q .. groupby6.q) when you run the tests. It is in the proces of being fixed by HIVE-3674. column level statistics --- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions
[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-1362: - Summary: Column level scalar valued statistics on Tables and Partitions (was: Column level scalar valued statistics) Column level scalar valued statistics on Tables and Partitions -- Key: HIVE-1362 URL: https://issues.apache.org/jira/browse/HIVE-1362 Project: Hive Issue Type: Sub-task Components: Statistics Reporter: Ning Zhang Assignee: Shreepadma Venugopalan Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira