[jira] [Updated] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4435:
-

Attachment: chart_1(1).png

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-29 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644850#comment-13644850
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Attached plot of relative error vs. number of distinct values after the fix. 
Dataset: TPC-H of varying sizes up to 10TB
hive.stats.ndv.error = 5% (standard error for the estimator)
Column types: String, Long, Double

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-04-27 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-4435:


 Summary: Column stats: Distinct value estimator should use hash 
functions that are pairwise independent
 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4426) Support statistics collection for partitioning key

2013-04-26 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-4426:


 Summary: Support statistics collection for partitioning key
 Key: HIVE-4426
 URL: https://issues.apache.org/jira/browse/HIVE-4426
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


We should support the ability to collect statistics on the partitioning key 
column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4321) Add Compile/Execute support to Hive Server

2013-04-09 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627252#comment-13627252
 ] 

Shreepadma Venugopalan commented on HIVE-4321:
--

[~sarahparra]: Can you post a review request on phabricator or review board? 
Please remove the files that are auto generated by the thrift compiler in the 
review request. Thanks.

 Add Compile/Execute support to Hive Server
 --

 Key: HIVE-4321
 URL: https://issues.apache.org/jira/browse/HIVE-4321
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Thrift API
Reporter: Sarah Parra
 Attachments: CompileExecute.patch


 Adds support for query compilation in Hive Server 2 and adds Thrift support 
 for compile/execute APIs.
 This enables scenarios that need to compile a query before it is executed, 
 e.g. and ODBC driver that implements SQLPrepare/SQLExecute. This is commonly 
 used for a client that needs metadata for the query before it is executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4301) Bulk retrieval API for column stats

2013-04-05 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-4301:


 Summary: Bulk retrieval API for column stats
 Key: HIVE-4301
 URL: https://issues.apache.org/jira/browse/HIVE-4301
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan


Provide APIs to bulk fetch column stats i.e., stats for all columns in a table 
and stats for all columns in all partitions in a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4301) Bulk retrieval API for column stats

2013-04-05 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4301:


Assignee: Shreepadma Venugopalan

 Bulk retrieval API for column stats
 ---

 Key: HIVE-4301
 URL: https://issues.apache.org/jira/browse/HIVE-4301
 Project: Hive
  Issue Type: Bug
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Provide APIs to bulk fetch column stats i.e., stats for all columns in a 
 table and stats for all columns in all partitions in a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-27 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615847#comment-13615847
 ] 

Shreepadma Venugopalan commented on HIVE-4119:
--

[~cwsteinbach]: Would it be possible to take a look at the new patch? Thanks.

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 

[jira] [Commented] (HIVE-4226) Cleanup non-threadsafe code in Hive

2013-03-25 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612865#comment-13612865
 ] 

Shreepadma Venugopalan commented on HIVE-4226:
--

[~snarayanan]: Thank you very much for contributing this patch to the project. 
I've a question regarding the QHS. Does this build on the existing HiveServer 
or is this something you guys have built from scratch? 

 Cleanup non-threadsafe code in Hive
 ---

 Key: HIVE-4226
 URL: https://issues.apache.org/jira/browse/HIVE-4226
 Project: Hive
  Issue Type: Improvement
Reporter: Sivaramakrishnan Narayanan

 There is some code in Hive that is not threadsafe. These usually bubble up as 
 problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these 
 issues.
 Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded 
 hive server at Qubole (QHS) which is running in production for a couple of 
 months now. As part of this effort, we've fixed a number of instances of 
 non-threadsafe code. I'm looking to contribute this back to the community.
 Note that there is no new functionality here - just some better hygiene. If 
 there are any stress tests that have revealed hive server bugs in the past, 
 it will be great if they can be added to the jira.
 Also, this is my first attempt at contributing to Apache, so please forgive 
 any mistakes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4226) Cleanup non-threadsafe code in Hive

2013-03-25 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612870#comment-13612870
 ] 

Shreepadma Venugopalan commented on HIVE-4226:
--

HIVE-4141, HIVE-4075 are relevant and were recently fixed.

 Cleanup non-threadsafe code in Hive
 ---

 Key: HIVE-4226
 URL: https://issues.apache.org/jira/browse/HIVE-4226
 Project: Hive
  Issue Type: Improvement
Reporter: Sivaramakrishnan Narayanan

 There is some code in Hive that is not threadsafe. These usually bubble up as 
 problems in Hive Server. This JIRA tracks fixing (hopefully, all) of these 
 issues.
 Some context: we've implemented a multi-tenant (multiple dbs), multi-threaded 
 hive server at Qubole (QHS) which is running in production for a couple of 
 months now. As part of this effort, we've fixed a number of instances of 
 non-threadsafe code. I'm looking to contribute this back to the community.
 Note that there is no new functionality here - just some better hygiene. If 
 there are any stress tests that have revealed hive server bugs in the past, 
 it will be great if they can be added to the jira.
 Also, this is my first attempt at contributing to Apache, so please forgive 
 any mistakes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-22 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4119:
-

Attachment: HIVE-4119.2.patch

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 

[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-22 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611515#comment-13611515
 ] 

Shreepadma Venugopalan commented on HIVE-4119:
--

New patch addresses the review comments. Thanks.

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 

[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4119:
-

Status: Patch Available  (was: Open)

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at 

[jira] [Updated] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-4119:
-

Attachment: HIVE-4119.1.patch

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)

[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-14 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602713#comment-13602713
 ] 

Shreepadma Venugopalan commented on HIVE-4119:
--

Review request: https://reviews.apache.org/r/9929/

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   

[jira] [Created] (HIVE-4153) Use number of distinct values to decide whether to perform map side aggregation

2013-03-12 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-4153:


 Summary: Use number of distinct values to decide whether to 
perform map side aggregation
 Key: HIVE-4153
 URL: https://issues.apache.org/jira/browse/HIVE-4153
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.0, 0.8.1, 0.8.0
Reporter: Shreepadma Venugopalan


Today, Hive decides to perform a map side aggregation by default. If the number 
of unique keys in the aggregation is small, performing a map side aggregation 
is beneficial. However, if the number of keys is sufficiently large, it can 
lead to OOMEs. Upon encountering an OOME, hive.map.aggr has be set to false to 
turn it off. Instead, we can use the number of distinct values in the group by 
column along with the number of rows in the table to decide if map side 
aggregation should be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-05 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4119:


Assignee: Shreepadma Venugopalan

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at 

[jira] [Assigned] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2013-03-05 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4118:


Assignee: Shreepadma Venugopalan

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully 
 qualified table name
 

 Key: HIVE-4118
 URL: https://issues.apache.org/jira/browse/HIVE-4118
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan

 Computing column stats fails when using fully qualified table name. Issuing a 
 USE db and using only the table name succeeds.
 {code}
 hive -e ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS 
 int_col
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Table somedb.some_table for which stats is 
 gathered doesn't exist.)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
   at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
   at $Proxy9.updateTableColumnStatistics(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
   at $Proxy10.update_table_column_statistics(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
   at $Proxy11.updateTableColumnStatistics(Unknown Source)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
   ... 18 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements

2013-02-27 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588639#comment-13588639
 ] 

Shreepadma Venugopalan commented on HIVE-4064:
--

I believe there is a problem with a number of DDLs including ALTER TABLE, 
CREATE INDEX. 

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan

 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements

2013-02-22 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-4064:


 Summary: Handle db qualified names consistently across all HiveQL 
statements
 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan


Hive doesn't consistently handle db qualified names across all HiveQL 
statements. While some HiveQL statements such as SELECT support DB qualified 
names, other such as CREATE INDEX doesn't. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4021) PostgreSQL upgrade scripts are creating column with incorrect name

2013-02-14 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578705#comment-13578705
 ] 

Shreepadma Venugopalan commented on HIVE-4021:
--

Looks good, +1.

 PostgreSQL upgrade scripts are creating column with incorrect name
 --

 Key: HIVE-4021
 URL: https://issues.apache.org/jira/browse/HIVE-4021
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Priority: Trivial
 Attachments: bugHIVE-4021.patch


 I've noticed that PostgreSQL upgrade scripts are creating table 
 {{PART_COL_STATS}} and {{TAB_COL_STATS}} with column {{DOUBLE_HIGH_VALUES}}, 
 however hive (and all other scripts) are expecting column name 
 {{DOUBLE_HIGH_VALUE}} (without the S at the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly

2013-02-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573991#comment-13573991
 ] 

Shreepadma Venugopalan commented on HIVE-3179:
--

+1.

 HBase Handler doesn't handle NULLs properly
 ---

 Key: HIVE-3179
 URL: https://issues.apache.org/jira/browse/HIVE-3179
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0, 0.10.0
Reporter: Lars Francke
Priority: Critical
 Attachments: HIVE-3179.1.patch


 We found a quite severe issue in the HBase Handler which actually means that 
 Hive potentially returns incorrect data if a column has NULL values in HBase 
 (which means the cell doesn't even exist)
 In HBase Shell:
 {noformat}
 create 'hive_hbase_test', 'test'
 put 'hive_hbase_test', '1', 'test:c1', 'c1-1'
 put 'hive_hbase_test', '1', 'test:c2', 'c2-1'
 put 'hive_hbase_test', '1', 'test:c3', 'c3-1'
 put 'hive_hbase_test', '2', 'test:c1', 'c1-2'
 {noformat}
 In Hive:
 {noformat}
 DROP TABLE IF EXISTS hive_hbase_test;
 CREATE EXTERNAL TABLE hive_hbase_test (
   id int,
   c1 string,
   c2 string,
   c3 string
 )
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping =
 :key#s,test:c1#s,test:c2#s,test:c3#s)
 TBLPROPERTIES(hbase.table.name = hive_hbase_test);
 hive select * from hive_hbase_test;
 OK
 1 c1-1c2-1c3-1
 2 c1-2NULLNULL
 hive select c1 from hive_hbase_test;
 c1-1
 c1-2
 hive select c1, c2 from hive_hbase_test;
 c1-1  c2-1
 c1-2  NULL
 {noformat}
 So far everything is correct but now:
 {noformat}
 hive select c1, c2, c2 from hive_hbase_test;
 c1-1  c2-1c2-1
 c1-2  NULLc2-1
 {noformat}
 Selecting c2 twice works the first time but the second time we
 actually get the value from the previous row.
 {noformat}
 hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test;
 c1-1  c3-1c2-1c2-1c3-1c3-1c1-1
 c1-2  NULLNULLc2-1c3-1c3-1c1-2
 {noformat}
 We've narrowed this down to an early initialization of 
 {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and 
 we'll try to provide a patch which surely needs review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3994) Hive metastore is not working on PostgreSQL 9.2 (most likely on anything 9.0+)

2013-02-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574113#comment-13574113
 ] 

Shreepadma Venugopalan commented on HIVE-3994:
--

This problem appears in postgres 9.x because standard conforming strings were 
turned on by default starting 9.x. More here - 
http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1#Backward_compatibility_issues.
 One fix for this issue is to set standard_conforming_string to off when 
setting up hive metastore on postgres.

 Hive metastore is not working on PostgreSQL 9.2 (most likely on anything 9.0+)
 --

 Key: HIVE-3994
 URL: https://issues.apache.org/jira/browse/HIVE-3994
 Project: Hive
  Issue Type: Improvement
Reporter: Jarek Jarcec Cecho

 I'm getting following exception when running metastore on PostgreSQL 9.2:
 {code}
 Caused by: javax.jdo.JDODataStoreException: Error executing JDOQL query 
 SELECT THIS.TBL_NAME AS NUCORDER0 FROM TBLS THIS LEFT OUTER JOIN 
 DBS THIS_DATABASE_NAME ON THIS.DB_ID = THIS_DATABASE_NAME.DB_ID 
 WHERE THIS_DATABASE_NAME.NAME = ? AND (LOWER(THIS.TBL_NAME) LIKE ? 
 ESCAPE '\\' ) ORDER BY NUCORDER0  : ERROR: invalid escape string
   Hint: Escape string must be empty or one character..
 NestedThrowables:
 org.postgresql.util.PSQLException: ERROR: invalid escape string
   Hint: Escape string must be empty or one character.
 at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:252)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:759)
 ... 28 more
 Caused by: org.postgresql.util.PSQLException: ERROR: invalid escape string
   Hint: Escape string must be empty or one character.
 at 
 org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
 at 
 org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
 at 
 org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at 
 org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
 at 
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
 at 
 org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:271)
 at 
 org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at 
 org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at 
 org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:457)
 at 
 org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:123)
 at 
 org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288)
 at org.datanucleus.store.query.Query.executeQuery(Query.java:1657)
 at 
 org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
 at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
 ... 29 more
 {code}
 I've google a bit about that and I found a lot of similar issues in different 
 projects thus I'm assuming that this might be some backward compatibility 
 issue on PostgreSQL side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4001) Add o.a.h.h.serde.Constants for backward compatibility

2013-02-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574175#comment-13574175
 ] 

Shreepadma Venugopalan commented on HIVE-4001:
--

Looks good. +1.

 Add o.a.h.h.serde.Constants for backward compatibility
 --

 Key: HIVE-4001
 URL: https://issues.apache.org/jira/browse/HIVE-4001
 Project: Hive
  Issue Type: Improvement
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4001.D8457.1.patch


 It's renamed to 'serdeConstants' in hive-0.10.0. But the class can be 
 referenced by all of the custom implementations including UDFs, Serdes, 
 StorageHandlers, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3917) Support fast operation for analyze command

2013-01-28 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564856#comment-13564856
 ] 

Shreepadma Venugopalan commented on HIVE-3917:
--

I assume there this will allow gathering some statistics namely number of 
files, size in bytes when the data storage is HDFS. Is there a plan to support 
'noscan' for other statistics such as number of rows, stats on columns such as 
top k etc? If not, is there a plan to deal with some stats, namely the ones 
that can't be gathered through noscan, being stale? 

 Support fast operation for analyze command
 --

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3917) Support fast operation for analyze command

2013-01-28 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564888#comment-13564888
 ] 

Shreepadma Venugopalan commented on HIVE-3917:
--

[~gangtimliu]: Thanks for the clarification. If we add a flag to indicate stats 
is stale, how will we distinguish between the case when the stats is really 
stale vs the case when some stats have been updated from a noscan operation?

 Support fast operation for analyze command
 --

 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3917.patch.1


 hive supports analyze command to gather statistics from existing 
 tables/partition 
 https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
 It collects:
 1. Number of Rows
 2. Number of files
 3. Size in Bytes
 If table/partition is big, the operation would take time since it will open 
 all files and scan all data.
 It would be nice to support fast operation to gather statistics which doesn't 
 require to open all files:
 1. Number of files
 2. Size in Bytes
 Potential syntax is 
 ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
 COMPUTE STATISTICS [noscan];
 In the future, all statistics without scan can be retrieved via this optional 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3931) Add Oracle metastore upgrade script for 0.9 to 10.0

2013-01-23 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561068#comment-13561068
 ] 

Shreepadma Venugopalan commented on HIVE-3931:
--

Looks good. Non-committer +1.

 Add Oracle metastore upgrade script for 0.9 to 10.0
 ---

 Key: HIVE-3931
 URL: https://issues.apache.org/jira/browse/HIVE-3931
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.11.0

 Attachments: HIVE-3931-1.patch


 The top level Oracle metastore upgrade script for 0.9 to 0.10 is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) Optimizer statistics on columns in tables and partitions

2013-01-23 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Summary: Optimizer statistics on columns in tables and partitions  (was: 
Column level scalar valued statistics on Tables and Partitions)

 Optimizer statistics on columns in tables and partitions
 

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, 
 HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, 
 HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, 
 HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, 
 HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, 
 HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, 
 HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, 
 HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-33) [Hive]: Add optimizer statistics in Hive

2013-01-23 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-33:
---

Summary: [Hive]: Add optimizer statistics in Hive  (was: [Hive]: Add 
ability to compute statistics on hive tables)

 [Hive]: Add optimizer statistics in Hive
 

 Key: HIVE-33
 URL: https://issues.apache.org/jira/browse/HIVE-33
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Statistics
Reporter: Ashish Thusoo
  Labels: statistics

 Add commands to collect partition and column level statistics in hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1940) Query Optimization Using Column Statistics and Histograms

2013-01-23 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1940:
-

Summary: Query Optimization Using Column Statistics and Histograms  (was: 
Query Optimization Using Column Metadata and Histograms)

 Query Optimization Using Column Statistics and Histograms
 -

 Key: HIVE-1940
 URL: https://issues.apache.org/jira/browse/HIVE-1940
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor, Statistics
Reporter: Anja Gruenheid
 Attachments: Agruenheid_ideas11.pdf, HiveMetaStore.pdf


 The current basis for cost-based query optimization in Hive is information 
 gathered on tables and partitions. To make further improvements in query 
 optimization possible, the next step is to develop and implement 
 possibilities to gather information on columns as discussed in issue HIVE-33. 
 After that, an implementation of histograms is a possible option to use and 
 collect run-time statistics. Next to the actual implementation of these 
 features, it is also necessary to develop a consistent storage model for the 
 MetaStore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-15 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554243#comment-13554243
 ] 

Shreepadma Venugopalan commented on HIVE-3004:
--

Thanks Ashutosh!

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Fix For: 0.11.0

 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt, HIVE-3004.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Status: Open  (was: Patch Available)

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Status: Patch Available  (was: Open)

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt, HIVE-3004.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Attachment: HIVE-3004.4.patch

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt, HIVE-3004.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated

2013-01-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3886:
-

Status: Patch Available  (was: Open)

 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
 -

 Key: HIVE-3886
 URL: https://issues.apache.org/jira/browse/HIVE-3886
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Minor
 Attachments: HIVE-3886.1.patch


 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated

2013-01-14 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3886:
-

Attachment: HIVE-3886.1.patch

 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
 -

 Key: HIVE-3886
 URL: https://issues.apache.org/jira/browse/HIVE-3886
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Minor
 Attachments: HIVE-3886.1.patch


 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Status: Patch Available  (was: Open)

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Attachment: HIVE-3004.3.patch.txt

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551697#comment-13551697
 ] 

Shreepadma Venugopalan commented on HIVE-3004:
--

Thanks Ashutosh. I've attached the new patch to the JIRA. 

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551698#comment-13551698
 ] 

Shreepadma Venugopalan commented on HIVE-3004:
--

Review board : https://reviews.apache.org/r/8931/

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
 HIVE-3004.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3426) union with same source should be optimized

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551701#comment-13551701
 ] 

Shreepadma Venugopalan commented on HIVE-3426:
--

Yup, let's try to optimize the simple case first. Optimizing subqueries with 
GBY can be the next step. 

 union with same source should be optimized
 --

 Key: HIVE-3426
 URL: https://issues.apache.org/jira/browse/HIVE-3426
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Zhenxiao Luo



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3653) Failure in a counter poller run should not be considered as a job failure

2013-01-11 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-3653:


Assignee: Shreepadma Venugopalan

 Failure in a counter poller run should not be considered as a job failure
 -

 Key: HIVE-3653
 URL: https://issues.apache.org/jira/browse/HIVE-3653
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.7.1
Reporter: Harsh J
Assignee: Shreepadma Venugopalan

 A client had a simple transient failure in polling the JT for job status 
 (which it does for HIVECOUNTERSPULLINTERVAL for each currently running job).
 {code}
 java.io.IOException: Call to HOST/IP:PORT failed on local exception: 
 java.io.IOException: Connection reset by peer 
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) 
 at org.apache.hadoop.ipc.Client.call(Client.java:1110) 
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) 
 at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) 
 at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) 
 at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) 
 at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) 
 at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686) 
 at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) 
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) 
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) 
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) 
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) 
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) 
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) 
 at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310) 
 at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317) 
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490) 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  
 at java.lang.reflect.Method.invoke(Method.java:597) 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197) 
 {code}
 This lead to Hive thinking the running job itself has failed, and it failed 
 the query run, although the running job progressed to completion in the 
 background.
 We should not let transient IOExceptions in counter polling cause query 
 termination, and should instead just retry.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-10 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550693#comment-13550693
 ] 

Shreepadma Venugopalan commented on HIVE-3875:
--

Thanks Carl for committing.

 negative value for hive.stats.ndv.error should be disallowed 
 -

 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.11.0

 Attachments: HIVE-3875.1.patch.txt


 Currently, if a negative value is specified for hive.stats.ndv.error in 
 hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated

2013-01-10 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3886:


 Summary: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is 
deprecated
 Key: HIVE-3886
 URL: https://issues.apache.org/jira/browse/HIVE-3886
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.9.0, 0.10.0, 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Minor


WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3887) Upgrade Hive's Avro dependency to version 1.7.3

2013-01-10 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3887:


 Summary: Upgrade Hive's Avro dependency to version 1.7.3
 Key: HIVE-3887
 URL: https://issues.apache.org/jira/browse/HIVE-3887
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-09 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3875:


 Summary: negative value for hive.stats.ndv.error should be 
disallowed 
 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: CDH-9733.1.patch.txt

Currently, if a negative value is specified for hive.stats.ndv.error in 
hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3875:
-

Status: Patch Available  (was: Open)

 negative value for hive.stats.ndv.error should be disallowed 
 -

 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: CDH-9733.1.patch.txt


 Currently, if a negative value is specified for hive.stats.ndv.error in 
 hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3875:
-

Attachment: CDH-9733.1.patch.txt

 negative value for hive.stats.ndv.error should be disallowed 
 -

 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: CDH-9733.1.patch.txt


 Currently, if a negative value is specified for hive.stats.ndv.error in 
 hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3875:
-

Attachment: (was: CDH-9733.1.patch.txt)

 negative value for hive.stats.ndv.error should be disallowed 
 -

 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Currently, if a negative value is specified for hive.stats.ndv.error in 
 hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3875) negative value for hive.stats.ndv.error should be disallowed

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3875:
-

Attachment: HIVE-3875.1.patch.txt

 negative value for hive.stats.ndv.error should be disallowed 
 -

 Key: HIVE-3875
 URL: https://issues.apache.org/jira/browse/HIVE-3875
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3875.1.patch.txt


 Currently, if a negative value is specified for hive.stats.ndv.error in 
 hive-site.xml, it is treated as 0. We should instead throw an exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3004:
-

Status: Open  (was: Patch Available)

I'm working on rebasing the patch off of the trunk.

 RegexSerDe should support other column types in addition to STRING
 --

 Key: HIVE-3004
 URL: https://issues.apache.org/jira/browse/HIVE-3004
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Carl Steinbach
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3877) Implement equi-depth histograms as a UDAF

2013-01-09 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3877:


 Summary: Implement equi-depth histograms as a UDAF
 Key: HIVE-3877
 URL: https://issues.apache.org/jira/browse/HIVE-3877
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


Implement a space and time efficient algorithm to bin numeric column data such 
that all bins approximately contain the same number of elements. Implement such 
an algorithm as a generic UDAF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3878) Enhance the existing thrift APIs to persist the histogram

2013-01-09 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3878:


 Summary: Enhance the existing thrift APIs to persist the histogram 
 Key: HIVE-3878
 URL: https://issues.apache.org/jira/browse/HIVE-3878
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan


Enhance the existing thrift APIs added for column statistics to persist 
histograms in addition to the scalar stats value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3881) Extend the analyze table syntax to allow the user to request computing histogram

2013-01-09 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3881:


 Summary: Extend the analyze table syntax to allow the user to 
request computing histogram
 Key: HIVE-3881
 URL: https://issues.apache.org/jira/browse/HIVE-3881
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


Since computing histograms can be expensive, by default only scalar statistics 
on columns will be gathered when an analyze table .. compute statistics for 
columns ... is executed. This JIRA covers the task of extending the analyze 
table to allow the user to specify computing histogram in addition to other 
statistics on columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3879) Enhance the existing thrift APIs to retrieve the histogram corresponding to a column

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-3879:


Assignee: Shreepadma Venugopalan

 Enhance the existing thrift APIs to retrieve the histogram corresponding to a 
 column
 

 Key: HIVE-3879
 URL: https://issues.apache.org/jira/browse/HIVE-3879
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Enhance the existing thrift API to retrieve the histogram, if it exists, 
 corresponding to a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3878) Enhance the existing thrift APIs to persist the histogram

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-3878:


Assignee: Shreepadma Venugopalan

 Enhance the existing thrift APIs to persist the histogram 
 --

 Key: HIVE-3878
 URL: https://issues.apache.org/jira/browse/HIVE-3878
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Enhance the existing thrift APIs added for column statistics to persist 
 histograms in addition to the scalar stats value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition

2013-01-09 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549151#comment-13549151
 ] 

Shreepadma Venugopalan commented on HIVE-3286:
--

HIVE-3526 covers the task of computing and persisting histograms on numeric 
columns in Hive tables and partitions. 

 Explicit skew join on user provided condition
 -

 Key: HIVE-3286
 URL: https://issues.apache.org/jira/browse/HIVE-3286
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, 
 HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch


 Join operation on table with skewed data takes most of execution time 
 handling the skewed keys. But mostly we already know about that and even know 
 what is look like the skewed keys.
 If we can explicitly assign reducer slots for the skewed keys, total 
 execution time could be greatly shortened.
 As for a start, I've extended join grammar something like this.
 {code}
 select * from src a join src b on a.key=b.key skew on (a.key+1  50, a.key+1 
  100, a.key  150);
 {code}
 which means if above query is executed by 20 reducers, one reducer for 
 a.key+1  50, one reducer for 50 = a.key+1  100, one reducer for 99 = 
 a.key  150, and 17 reducers for others (could be extended to assign more 
 than one reducer later)
 This can be only used with common-inner-equi joins. And skew condition should 
 be composed of join keys only.
 Work till done now will be updated shortly after code cleanup.
 
 Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially 
 at runtime, and first 'true' one decides skew group for the row. Each skew 
 group has reserved partition slot(s), to which all rows in a group would be 
 assigned. 
 The number of partition slot reserved for each group is decided also at 
 runtime by simple calculation of percentage. If a skew group is CLUSTER BY 
 20 PERCENT and total partition slot (=number of reducer) is 20, that group 
 will reserve 4 partition slots, etc.
 DISTRIBUTE BY decides how the rows in a group is dispersed in the range of 
 reserved slots (If there is only one slot for a group, this is meaningless). 
 Currently, three distribution policies are available: RANDOM, KEYS, 
 expression. 
 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
 non-driver alias are duplicated for all the slots (default if not specified)
 2. KEYS : determined by hash value of keys (same with previous)
 3. expression : determined by hash of object evaluated by user-provided 
 expression
 Only possible with inner, equi, common-joins. Not yet supports join tree 
 merging.
 Might be used by other RS users like SORT BY or GROUP BY
 If there exists column statistics for the key, it could be possible to apply 
 automatically.
 For example, if 20 reducers are used for the query below,
 {code}
 select count(*) from src a join src b on a.key=b.key skew on (
a.key = '0' CLUSTER BY 10 PERCENT,
b.key  '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
cast(a.key as int)  300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
 {code}
 group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
 reserve slots 0~5.
 For a row with key='0' from alias a, the row is randomly assigned in the 
 range of 6~7 (driver alias) : 6 or 7
 For a row with key='0' from alias b, the row is disributed for all slots in 
 6~7 (non-driver alias) : 6 and 7
 For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
 of upper(b.key) : 8 + (hash(upper(key)) % 4)
 For a row with key='500', the row is assigned in the range of 12~19 by 
 hashcode of join key : 12 + (hash(key) % 8)
 For a row with key='200', this is not belong to any skew group : hash(key) % 6
 *expressions in skew condition : 
 1. all expressions should be made of expression in join condition, which 
 means if join condition is a.key=b.key, user can make any expression with 
 a.key or b.key. But if join condition is a.key+1=b.key, user cannot make 
 expression with a.key solely (should make expression with a.key+1). 
 2. all expressions should reference one and only-one side of aliases. For 
 example, simple constant expressions or expressions referencing both side of 
 join condition (a.key+b.key100) is not allowed.
 3. all functions in expression should be deteministic and stateless.
 4. if DISTRIBUTED BY expression is used, distibution expression also should 
 have same alias with skew expression.
 **driver alias :
 1. driver alias means the sole referenced alias from skew expression, which 
 is important for RANDOM distribution. rows of driver alias are 

[jira] [Commented] (HIVE-3764) Support metastore version consistency check

2012-12-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509300#comment-13509300
 ] 

Shreepadma Venugopalan commented on HIVE-3764:
--

I think adding the consistency check is a good idea too. I've not looked into 
all the details of the code, but I noticed that the metastore version number is 
the hive release version. While this makes the version numbers easily readable, 
we would need to provide scripts and perform a metastore upgrade on every Hive 
release even if there are no other patches in the release that require a 
metastore schema upgrade. The other option would be to use version numbers from 
a monotonically increasing sequence instead and bump up the version number only 
if there are changes in a release that require a metastore upgrade. Wondering 
if you have considered the later option. Thanks.

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.10.0

 Attachments: HIVE-3764-1.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3764) Support metastore version consistency check

2012-12-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509305#comment-13509305
 ] 

Shreepadma Venugopalan commented on HIVE-3764:
--

Irrespective of which option we choose to generate version numbers, we should 
not execute the insert/update version number statement in the schema 
creation/upgrade script until all other statements in the schema 
creation/upgrade script have completed without errors. Thanks.

 Support metastore version consistency check
 ---

 Key: HIVE-3764
 URL: https://issues.apache.org/jira/browse/HIVE-3764
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.10.0

 Attachments: HIVE-3764-1.patch


 Today there's no version/compatibility information stored in hive metastore. 
 Also the datanucleus configuration property to automatically create missing 
 tables is enabled by default. If you happen to start an older or newer hive 
 or don't run the correct upgrade scripts during migration, the metastore 
 would end up corrupted. The autoCreate schema is not always sufficient to 
 upgrade metastore when migrating to newer release. It's not supported with 
 all databases. Besides the migration often involves altering existing table, 
 changing or moving data etc.
 Hence it's very useful to have some consistency check to make sure that hive 
 is using correct metastore and for production systems the schema is not 
 automatically by running hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3747) Provide hive operation name for hookContext

2012-12-02 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508481#comment-13508481
 ] 

Shreepadma Venugopalan commented on HIVE-3747:
--

Thanks Namit for creating a review request. Will do so in the future for other 
reviews. 

 Provide hive operation name for hookContext
 ---

 Key: HIVE-3747
 URL: https://issues.apache.org/jira/browse/HIVE-3747
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sudhanshu Arora
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3747.1.patch.txt


 The hookContext exposed through ExecuteWithHookContext, does not provide the 
 name of the Hive operation. 
 The following public API should be added in HookContext.
 public String getOperationName() {
 return SessionState.get().getHiveOperation().name();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3747) Provide hive operation name for hookContext

2012-11-30 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-3747:


Assignee: Shreepadma Venugopalan

 Provide hive operation name for hookContext
 ---

 Key: HIVE-3747
 URL: https://issues.apache.org/jira/browse/HIVE-3747
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sudhanshu Arora
Assignee: Shreepadma Venugopalan

 The hookContext exposed through ExecuteWithHookContext, does not provide the 
 name of the Hive operation. 
 The following public API should be added in HookContext.
 public String getOperationName() {
 return SessionState.get().getHiveOperation().name();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3747) Provide hive operation name for hookContext

2012-11-30 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3747:
-

Status: Patch Available  (was: Open)

 Provide hive operation name for hookContext
 ---

 Key: HIVE-3747
 URL: https://issues.apache.org/jira/browse/HIVE-3747
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sudhanshu Arora
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3747.1.patch.txt


 The hookContext exposed through ExecuteWithHookContext, does not provide the 
 name of the Hive operation. 
 The following public API should be added in HookContext.
 public String getOperationName() {
 return SessionState.get().getHiveOperation().name();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3747) Provide hive operation name for hookContext

2012-11-30 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3747:
-

Attachment: HIVE-3747.1.patch.txt

 Provide hive operation name for hookContext
 ---

 Key: HIVE-3747
 URL: https://issues.apache.org/jira/browse/HIVE-3747
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sudhanshu Arora
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3747.1.patch.txt


 The hookContext exposed through ExecuteWithHookContext, does not provide the 
 name of the Hive operation. 
 The following public API should be added in HookContext.
 public String getOperationName() {
 return SessionState.get().getHiveOperation().name();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-30 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507831#comment-13507831
 ] 

Shreepadma Venugopalan commented on HIVE-3720:
--

@Namit: The authorization model in this proposal mirrors that of MySQL as 
closely as possible. The proposal also documents wherever there is a deviation 
from MySQL's authorization model. Since Hive's data model is based on that of 
MySQL, it would make a lot of sense to base the authorization model on MySQL's 
as well. The proposed functionality is not necessarily a superset of the 
existing authorization functionality but subsumes some of the existing 
functionality. While the existing implementation supports authorization on some 
HiveQL operations, it doesn't secure all of the operations, provide a way to 
bootstrap the system etc. This proposal expands authorization to all HiveQL 
operations and direct metadata operations that can be performed by invoking the 
metastore Thrift API. 

As discussed earlier, since the proposed model standardizes the authorization 
model to mirror that of MySQL, it deviates from the existing model where ever 
the existing implementation deviates from the authorization model of MySQL or 
other RDBMSs. The proposed model is also more fine grained and supports 
hierarchical privileges much like an RDBMS. For instance, the proposed model 
supports CREATE, ALTER, DROP privileges on objects whereas the current model 
supports an ALTER_METADATA privilege that includes the privileges needed to 
perform CREATE, ALTER, DROP etc. Note that one of the goals is to propose an 
authorization model such that finer grained privileges can be added in as 
necessary later. 

Since the existing implementation is not complete, it unclear at this point 
what part of the functionality has been completely implemented. Perhaps we can 
mark the existing functionality in the wiki once we start implementing the 
proposed model. Thanks.

 Expand and standardize authorization in Hive
 

 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: Hive_Authorization_Functionality.pdf


 The existing implementation of authorization in Hive is not complete. 
 Additionally the existing implementation has security holes. This JIRA is an 
 umbrella JIRA  for a) extending authorization to all SQL operations and 
 direct metadata operations, and b) standardizing the authorization model and 
 its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.4.patch.txt

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504852#comment-13504852
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Uploaded patch rebased off tip of trunk. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504337#comment-13504337
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Ashutosh: I've uploaded a new patch which adds 2 varchar columns for storing 
BigDecimal low and high values. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.3.patch.txt

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504338#comment-13504338
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Updated patch is available on both JIRA and RB. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-25 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503531#comment-13503531
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Ashutosh: If store long/double types as a varchar instead of storing it as a 
numeric type, we can avoid evolving the schema when we add a BigDecimal type. 
That's the only benefit I see for storing long/double as a varchar. However, I 
agree with you that we should avoid untyping data when possible. Let me know 
your thoughts.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-24 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503425#comment-13503425
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Ashutosh: Thanks for your comments. Do you think it makes sense to store 
numeric long/double/bigdecimal values in a varchar column? I don't see 
consistent BLOB/CLOB support across DB vendors and versions. If you agree, I'll 
make the change to store these numeric values in a varchar column and post a 
new patch. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-21 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502440#comment-13502440
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Ashutosh: My answers are inline.

We can add two more column in M*ColumnStatistics table of type BigDecimal: 
BigDecimalLowValue and BigDecimalHighValue. But is BigDecimal type supported 
consistently across different DBs?
Agreed, BigDecimal is not consistently supported across DBs. Hence we can't add 
a BigDecimal column consistently across DB vendors and versions easily.

We can have these two columns of type Double, but then we loose precision.
Yes, we can store BigDecimal and Long as Double but we will lose precision. 

We can store as plain strings in column of type varchar.
The maximum number of digits after the decimal point in a BigDecimal number is 
unlimited for all practical purpose. If we stored it in a varchar, it could 
result in truncation of some digits following the decimal point in some cases, 
but this seems to be the only practical solution.

We can store in json format in column of type varchar.
The maximum number of digits in a BigDecimal number after the decimal point is 
unlimited for all practical purposes (Java allows nearly 2 billion digits after 
the decimal point). At this time, we collect MIN, MAX column values for numeric 
columns. If we stored BigDecimal value, we may exceed the varchar size limit 
and as a result truncate the JSON blob. This would result in a malformed JSON 
object. Additionally we will also lose some of the column statistics.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500526#comment-13500526
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

With the changes from HIVE-3712, the column schema has *no* dependency on any 
specific db. The column schema, with the changes from HIVE-3712, uses simple 
data types, which are supported across DBs. The primary motivation for making 
the change to the schema in HIVE-3712 was to avoid storing column statistics 
fields as a BLOB. The problem with using a BLOB is a) BLOBs are designed to 
store large volumes of data in the order of GBs and are hence stored outside 
the row. A consequence of this design is BLOBs don't perform well for storing 
small amounts of data. While some DBs such as Oracle inline small BLOBs, all 
DBs don't. While BLOBs are the only practical choice for storing data whose 
size is not known in advance, it is an overkill for storing around 100 bytes of 
data, and b) there is no uniform support across DB vendors and versions. Hence 
I don't really see the value in storing this as a JSON BLOB.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Status: Patch Available  (was: Open)

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.1.patch.txt

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499924#comment-13499924
 ] 

Shreepadma Venugopalan commented on HIVE-3712:
--

It looks like VARBINARY is not supported across different DBs and DB versions 
in a consistent manner. Storing 8 bytes in a LONGVARBINARY is an overkill 
because the LONGVARBINARY is mapped to BLOB type in some DBs. It appears the 
best solution at this point is to store LONG and DOUBLE min and max values in 
two separate columns. 

 Use varbinary instead of longvarbinary to store min and max column values in 
 column stats schema
 

 Key: HIVE-3712
 URL: https://issues.apache.org/jira/browse/HIVE-3712
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
 and max column values for numeric types takes up 8 bytes and hence doesn't 
 require a BLOB. Storing these values in a BLOB will impact performance 
 without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3712:
-

Status: Patch Available  (was: Open)

 Use varbinary instead of longvarbinary to store min and max column values in 
 column stats schema
 

 Key: HIVE-3712
 URL: https://issues.apache.org/jira/browse/HIVE-3712
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
 and max column values for numeric types takes up 8 bytes and hence doesn't 
 require a BLOB. Storing these values in a BLOB will impact performance 
 without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499930#comment-13499930
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Review board link: https://reviews.apache.org/r/8119/

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499931#comment-13499931
 ] 

Shreepadma Venugopalan commented on HIVE-3712:
--

Review board link: https://reviews.apache.org/r/8119/

 Use varbinary instead of longvarbinary to store min and max column values in 
 column stats schema
 

 Key: HIVE-3712
 URL: https://issues.apache.org/jira/browse/HIVE-3712
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
 and max column values for numeric types takes up 8 bytes and hence doesn't 
 require a BLOB. Storing these values in a BLOB will impact performance 
 without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3720:


 Summary: Expand and standardize authorization in Hive
 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


The existing implementation of authorization in Hive is not complete. 
Additionally the existing implementation has security holes. This JIRA is an 
umbrella JIRA  for a) extending authorization to all SQL operations and direct 
metadata operations, and b) standardizing the authorization model and its 
semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3720:
-

Attachment: Hive_Authorization_Functionality.pdf

 Expand and standardize authorization in Hive
 

 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: Hive_Authorization_Functionality.pdf


 The existing implementation of authorization in Hive is not complete. 
 Additionally the existing implementation has security holes. This JIRA is an 
 umbrella JIRA  for a) extending authorization to all SQL operations and 
 direct metadata operations, and b) standardizing the authorization model and 
 its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive

2012-11-18 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1350#comment-1350
 ] 

Shreepadma Venugopalan commented on HIVE-3720:
--

Attached document outlines the authorization model and its semantics.

 Expand and standardize authorization in Hive
 

 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: Hive_Authorization_Functionality.pdf


 The existing implementation of authorization in Hive is not complete. 
 Additionally the existing implementation has security holes. This JIRA is an 
 umbrella JIRA  for a) extending authorization to all SQL operations and 
 direct metadata operations, and b) standardizing the authorization model and 
 its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-15 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3712:


 Summary: Use varbinary instead of longvarbinary to store min and 
max column values in column stats schema
 Key: HIVE-3712
 URL: https://issues.apache.org/jira/browse/HIVE-3712
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan


JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
and max column values for numeric types takes up 8 bytes and hence doesn't 
require a BLOB. Storing these values in a BLOB will impact performance without 
providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3705) Adding authorization capability to the metastore

2012-11-15 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498520#comment-13498520
 ] 

Shreepadma Venugopalan commented on HIVE-3705:
--

@Sushanth: Thanks for posting the document and the patch. Securing the 
metastore is necessary to provide reliable authorization in Hive. I looked at 
the document and the code and have the following high level questions,

 a)The document contains an example of how the current pluggable authorization 
provider can be exploited to circumvent security. This patch seems to introduce 
a new config param - hive.security.metastore.authorization.manager - that 
allows a pluggable authorization provider. Perhaps I'm missing something here, 
but wondering how we would prevent a user from plugging in their own 
authorization provider. 

 b)The current Hive authorization model exposes semantics that is confusing and 
at times inconsistent. While this patch has moved the auth checks to the 
metastore (IMO, this is the right thing to do) it seems to implement the 
existing semantics. Wondering if there is a plan to fix the semantics at some 
point.

 c)How do we obtain the userid for performing authorization? Are we using the 
authentication id from the Thrift context? If so, how do we handle the case 
where the authentication id is different from the authorization id, for e.g., 
HS2 authenticates to the metastore as HS2 but is executing a statement on 
behalf of user 'u1'? Thanks.

 Adding authorization capability to the metastore
 

 Key: HIVE-3705
 URL: https://issues.apache.org/jira/browse/HIVE-3705
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3705.D6681.1.patch, HIVE-3705.D6681.2.patch, 
 hivesec_investigation.pdf


 In an environment where multiple clients access a single metastore, and we 
 want to evolve hive security to a point where it's no longer simply 
 preventing users from shooting their own foot, we need to be able to 
 authorize metastore calls as well, instead of simply performing every 
 metastore api call that's made.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3706) getBoolVar in FileSinkOperator can be optimized

2012-11-13 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496447#comment-13496447
 ] 

Shreepadma Venugopalan commented on HIVE-3706:
--

Looks good. Non committer +1.


 getBoolVar in FileSinkOperator can be optimized
 ---

 Key: HIVE-3706
 URL: https://issues.apache.org/jira/browse/HIVE-3706
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3706.1.patch.txt


 There's a call to HiveConf.getBoolVar in FileSinkOperator's processOp method. 
  In benchmarks we found this call to be using ~2% of the CPU time on simple 
 queries, e.g. INSERT OVERWRITE TABLE t1 SELECT * FROM t2;
 This boolean value, a flag to collect the RawDataSize stat, won't change 
 during the processing of a query, so we can determine it at initialization 
 and store that value, saving that CPU.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3706) getBoolVar in FileSinkOperator can be optimized

2012-11-13 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496451#comment-13496451
 ] 

Shreepadma Venugopalan commented on HIVE-3706:
--

@Kevin: We should see if there are other opportunities to move such checks from 
execution to operator initialization.

 getBoolVar in FileSinkOperator can be optimized
 ---

 Key: HIVE-3706
 URL: https://issues.apache.org/jira/browse/HIVE-3706
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3706.1.patch.txt


 There's a call to HiveConf.getBoolVar in FileSinkOperator's processOp method. 
  In benchmarks we found this call to be using ~2% of the CPU time on simple 
 queries, e.g. INSERT OVERWRITE TABLE t1 SELECT * FROM t2;
 This boolean value, a flag to collect the RawDataSize stat, won't change 
 during the processing of a query, so we can determine it at initialization 
 and store that value, saving that CPU.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-08 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493587#comment-13493587
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Tim: I'm currently working on providing the upgrade scripts for different 
databases. Since there is a plan to release Hive 0.10 soon, we have to provide 
upgrade scripts for all of them. Thanks.


 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492495#comment-13492495
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

HIVE-3524 changed the signature of endFunction in HiveMetaStore.java. HIVE-3524 
was committed hours before this patch. The compile errors are due to the 
signature change. I'm working on a fix. Thanks.

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, 
 HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, 
 HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
 HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
 HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
 HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
 HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492498#comment-13492498
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Namit: Not sure what the protocol is but I've attach the new patch to this 
JIRA. Thanks.

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, 
 HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, 
 HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
 HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
 HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
 HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
 HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Attachment: HIVE-1362.11.patch.txt

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, 
 HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, 
 HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, 
 HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, 
 HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, 
 HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, 
 HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, 
 HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492505#comment-13492505
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

Please look at HIVE-1362.11.patch.txt to fix the compile errors introduced 
earlier.

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, 
 HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, 
 HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, 
 HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, 
 HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, 
 HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, 
 HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, 
 HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524

2012-11-07 Thread Shreepadma Venugopalan (JIRA)
Shreepadma Venugopalan created HIVE-3686:


 Summary: Fix compile errors introduced by the interaction of 
HIVE-1362 and HIVE-3524
 Key: HIVE-3686
 URL: https://issues.apache.org/jira/browse/HIVE-3686
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Blocker


HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was 
committed some hours before HIVE-1362. The change in signature broke the build 
after HIVE-1362 which still contained the old signature was committed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3686:
-

Status: Patch Available  (was: Open)

 Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
 ---

 Key: HIVE-3686
 URL: https://issues.apache.org/jira/browse/HIVE-3686
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Blocker
 Attachments: HIVE-1362.11.patch.txt


 HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was 
 committed some hours before HIVE-1362. The change in signature broke the 
 build after HIVE-1362 which still contained the old signature was committed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3686:
-

Attachment: HIVE-1362.11.patch.txt

 Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
 ---

 Key: HIVE-3686
 URL: https://issues.apache.org/jira/browse/HIVE-3686
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Blocker
 Attachments: HIVE-1362.11.patch.txt


 HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was 
 committed some hours before HIVE-1362. The change in signature broke the 
 build after HIVE-1362 which still contained the old signature was committed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492524#comment-13492524
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

Filed a new JIRA - HIVE-3686 to fix the compile errors.

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, 
 HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, 
 HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, 
 HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, 
 HIVE-1362.D6339.1.patch, HIVE-1362_gen-thrift.10.patch.txt, 
 HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, 
 HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt, 
 HIVE-1362-gen_thrift.5.patch.txt, HIVE-1362-gen_thrift.6.patch.txt, 
 HIVE-1362_gen-thrift.7.patch.txt, HIVE-1362_gen-thrift.8.patch.txt, 
 HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3686) Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492711#comment-13492711
 ] 

Shreepadma Venugopalan commented on HIVE-3686:
--

Thanks Kevin.

 Fix compile errors introduced by the interaction of HIVE-1362 and HIVE-3524
 ---

 Key: HIVE-3686
 URL: https://issues.apache.org/jira/browse/HIVE-3686
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Blocker
 Fix For: 0.10.0

 Attachments: HIVE-1362.11.patch.txt


 HIVE-3524 changed the signature of endFunction in HiveMetastore.java and was 
 committed some hours before HIVE-1362. The change in signature broke the 
 build after HIVE-1362 which still contained the old signature was committed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3689) Update website with info on how to report security bugs

2012-11-07 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492929#comment-13492929
 ] 

Shreepadma Venugopalan commented on HIVE-3689:
--

@Eli: In Hadoop land, who are the people with read access to the list i.e., the 
ones who can view the security vulnerabilities? Currently, all hive security 
issue seem to be in public domain on JIRA.

 Update website with info on how to report security bugs 
 

 Key: HIVE-3689
 URL: https://issues.apache.org/jira/browse/HIVE-3689
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Eli Collins

 The Hive website should be updated with information on how to report 
 potential security vulnerabilities. In Hadoop land we have a private security 
 list that anyone case post to that we point to on our list page: Hadoop 
 example http://hadoop.apache.org/general_lists.html#Security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) column level statistics

2012-11-06 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491764#comment-13491764
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Carl: Please take the latest patch from JIRA. If you have trouble applying it, 
let me know. Thanks.

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, 
 HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, 
 HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
 HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
 HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
 HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
 HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1362) column level statistics

2012-11-06 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491792#comment-13491792
 ] 

Shreepadma Venugopalan commented on HIVE-1362:
--

@Carl: You will see 6 failures in testParse (groupby1.q .. groupby6.q) when you 
run the tests. It is in the proces of being fixed by HIVE-3674.

 column level statistics
 ---

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, 
 HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, 
 HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
 HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
 HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
 HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
 HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1362) Column level scalar valued statistics on Tables and Partitions

2012-11-06 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-1362:
-

Summary: Column level scalar valued statistics on Tables and Partitions  
(was: Column level scalar valued statistics)

 Column level scalar valued statistics on Tables and Partitions
 --

 Key: HIVE-1362
 URL: https://issues.apache.org/jira/browse/HIVE-1362
 Project: Hive
  Issue Type: Sub-task
  Components: Statistics
Reporter: Ning Zhang
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-1362.10.patch.txt, HIVE-1362.1.patch.txt, 
 HIVE-1362.2.patch.txt, HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, 
 HIVE-1362.5.patch.txt, HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, 
 HIVE-1362.8.patch.txt, HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
 HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362-gen_thrift.1.patch.txt, 
 HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
 HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
 HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
 HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


<    1   2   3   4   >