Re: Review Request 46956: HIVE-13444 LLAP: add HMAC signatures to LLAP; verify them on LLAP side

2016-05-20 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46956/
---

(Updated May 21, 2016, 12:07 a.m.)


Review request for hive, Gunther Hagleitner, Jason Dere, and Siddharth Seth.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4cfa5f1 
  
llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenLocalClient.java
 f10351b 
  llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapSigner.java 
PRE-CREATION 
  
llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapTokenIdentifier.java
 e28eddd 
  llap-common/src/java/org/apache/hadoop/hive/llap/security/SecretManager.java 
465b204 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 2524dc2 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java 
de817e3 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java
 b94fc2e 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapTokenChecker.java
 03ee055 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
8abd198 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java
 eac0e8f 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
 74359fa 
  llap-server/src/java/org/apache/hadoop/hive/llap/security/LlapSignerImpl.java 
PRE-CREATION 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
 279baf1 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapTokenChecker.java
 762 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/comparator/TestFirstInFirstOutComparator.java
 a250882 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java c9b912b 

Diff: https://reviews.apache.org/r/46956/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-13814) DummyTable should set neededColumns as empty

2016-05-20 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-13814:
--

 Summary: DummyTable should set neededColumns as empty
 Key: HIVE-13814
 URL: https://issues.apache.org/jira/browse/HIVE-13814
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


Otherwise, it will throw NPE for column pruning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 46956: HIVE-13444 LLAP: add HMAC signatures to LLAP; verify them on LLAP side

2016-05-20 Thread Sergey Shelukhin


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapTokenChecker.java,
> >  line 25
> > 
> >
> > Think the patch which added Pair/ImmutablePair may have added a maven 
> > dependency. Should be removed if it was added explicitly for this.

it has since become necessary for StringUtils


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46956/#review134076
---


On May 18, 2016, 8:36 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46956/
> ---
> 
> (Updated May 18, 2016, 8:36 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner, Jason Dere, and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cbb3a72 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenLocalClient.java
>  f10351b 
>   llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapSigner.java 
> PRE-CREATION 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapTokenIdentifier.java
>  e28eddd 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapTokenProvider.java
>  PRE-CREATION 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/security/SecretManager.java 
> 465b204 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  2524dc2 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java 
> de817e3 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java
>  b94fc2e 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapTokenChecker.java
>  03ee055 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  8abd198 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java
>  eac0e8f 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
>  74359fa 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/security/LlapSecurityHelper.java
>  PRE-CREATION 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/security/LlapSignerImpl.java 
> PRE-CREATION 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  279baf1 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapTokenChecker.java
>  762 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/comparator/TestFirstInFirstOutComparator.java
>  a250882 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java c9b912b 
> 
> Diff: https://reviews.apache.org/r/46956/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 46956: HIVE-13444 LLAP: add HMAC signatures to LLAP; verify them on LLAP side

2016-05-20 Thread Sergey Shelukhin


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 2698-2699
> > 
> >
> > Is this primarily for config ? Rename to have a positive connotation 
> > maybe ?

it says in the description :)


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-common/src/java/org/apache/hadoop/hive/llap/security/LlapSigner.java, 
> > line 29
> > 
> >
> > I'm not sure this will actually be usable, given that what is being 
> > signed is a protobuf generated class.

It's used to implement a wrapper over protobuf


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-common/src/java/org/apache/hadoop/hive/llap/security/SecretManager.java,
> >  line 134
> > 
> >
> > Can a second login be avoided. I'm guessing this is because the ZK 
> > principla may be different from the llap principla.
> > What was the reason for them to be different again ? (Especially w.r.t 
> > the SecretManager). Not sure if the fallback to using the llap principal 
> > and keytab will work if they have to be different.

The same principal didn't work on the test cluster I had for some reason that I 
no longer remember :(


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java,
> >  line 168
> > 
> >
> > Move this to after checking if vertexBinary is set ? Potentially error 
> > out if both are set.
> > 
> > IIRC, vertexBinary will be set by external clients, and vertex will be 
> > set by Tez ?

yes


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java,
> >  line 262
> > 
> >
> > Why is this required ? The signature will only exist if vertexBinary is 
> > present ?

No reason why someone cannot set signature and vertex.


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java,
> >  line 170
> > 
> >
> > Maybe move all of these checks into the RPC layers itself ... i.e. 
> > LlapServiceServerImpl. As early as possible.

The permissions are checked in calls by ContainerRunner. RPC right now just 
propagates the request...


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java,
> >  line 287
> > 
> >
> > All of this logic should be invoked even when obtaining tokens from 
> > ZKSM directly.
> > 
> > Whether Tez is being used, or an external client - as long as HS2 is 
> > obtaining a token, it can do it directly from ZK. This code path is not 
> > likely to be exercised a lot.
> > Assuming that invocation (when it happens, and likely needs another 
> > jira) - will call in to LlapTokenLocalClient.createToken directly - and 
> > will send in isSigningRequired based on all of the same configs.
> > 
> > Would be better to move the logic out of this function in that case.
> > 
> > Maybe the config flag itself could be dropped. If Tez, no singing, if 
> > external - force signing.

this actually kind of orthogonal. This logic doesn't apply to when HS2 creates 
the token preciusely because HS2 knows whether it's creating the token for Tez 
or external, so it can set the flag accordingly.
When the method is called remotely, by default we always require signing, but 
that can be disabled for CLI, or HS2 calling remotely (presumably under the 
same user as LLAP).


> On May 20, 2016, 2:21 a.m., Siddharth Seth wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java,
> >  line 290
> > 
> >
> > What user is expected over here.
> > 1. In case of an invocation by HS2 to run a Tez query - I'm assuming 
> > this would be the HS2 service user (which is the same as the LLAP service 
> > user). (That needs to be validated)
> > 2. In case of external services - would this be the HS2 service user or 
> > the user associated with the external service ?
> > 
> > If it's the HS2 user each time, is the "user"/"realuser" field in the 
> > TokenIdentifier required ? That seems to be passed in as a null everywhere.
> > Assuming the appId is what will be used to differentiate different 
> > external clients ? and that in

[jira] [Created] (HIVE-13813) Add Metrics for the number of Hive operations waiting for compile

2016-05-20 Thread Chao Sun (JIRA)
Chao Sun created HIVE-13813:
---

 Summary: Add Metrics for the number of Hive operations waiting for 
compile
 Key: HIVE-13813
 URL: https://issues.apache.org/jira/browse/HIVE-13813
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.0.0, 1.3.0
Reporter: Chao Sun
Assignee: Chao Sun


Currently, without {{hive.driver.parallel.compilation}} introduced in 
HIVE-4239, only one SQL operation can enter the compilation block per HS2 
instance, and all the rest will be blocked. We should add metrics info for the 
number of operations that are blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 46690: HIVE-13068

2016-05-20 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46690/
---

(Updated May 20, 2016, 9:53 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-13068
https://issues.apache.org/jira/browse/HIVE-13068


Repository: hive-git


Description
---

HIVE-13068


Diffs (updated)
-

  hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 
27446b41db80ee98d56a4101a87f76be7f6dea2f 
  hbase-handler/src/test/results/positive/hbase_queries.q.out 
a99f561828fb8466a70ad639e73aaf65ac199b72 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 
bc52f7b8d7a151859631dba3ff585788f8c19698 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java
 9e9beb0d73372c81cc73afb2b92b1a791e3a491e 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 
37dbe32008685ba22e5dae1e4bfbfe090c5bfe9f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 
bf9a0a367b3b85f039076ac78290f8e35a8c3c62 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 4adf7b2b16eb2cea68e0fe9b554a62e65b4c388d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java 
0cfd5298899ea8dd16c073b26546c40de4451271 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
c6d1d46c62d8550750eea092245a55dd3b327f66 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRexUtil.java 
2f309f3de6acfac09b7b0d84cbb9d4275e317aeb 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregatePullUpConstantsRule.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveProjectFilterPullUpConstantsRule.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java
 2fe9b75038de8261fa123aa6e1d318ea6b0d1cec 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortLimitPullUpConstantsRule.java
 3be9b0a0dafde81692db696f1a8f9099a132aec6 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveUnionPullUpConstantsRule.java
 2552f8747ba4b3d4f46d1d06a5fe381cbd039468 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java
 e8107471eaebaf95aeb32fa93b2917861ebb0795 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
 de7e2f8feae424a27075b17ad9fb7de2dd81e735 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ExprNodeConverter.java
 e51b6c49f447d04fdcac6d23deda5d980f43822d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveGBOpConvUtil.java
 7fbf8cd232d8bb1114d64befd559646001dbd032 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
 13078089bd7d7552fdd5d0c28ab7534c9dc5220b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java
 368264c1de1b406a76dd9e12848c0f8a94b0df54 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java
 ee4f4ead6066a29e867cf51582c45d3dc69b1880 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
 0b76bffb42d88204f486278a12bbf24d1b7fc274 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
 2825f7787de4d42e9532bfb2642f4f95ba8f8b83 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/pcr/PcrExprProcFactory.java 
991117945e8bce1c4098f0641ff7674c8a314147 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
de6a053a5b299ee39ec9af865d077a886497189f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
7162c089cd125c660abaad5838da28ab167c73b5 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
2eaed564304f0f8293ce35227fcfef15398305ef 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java 
c6f89074457e1ed4e61d52c01d9cc515fe1a6f09 
  ql/src/test/queries/clientpositive/join_view.q 
16b6816f0c8618691ba7a28f4ca467d7526d6e13 
  ql/src/test/results/clientpositive/annotate_stats_filter.q.out 
ba0419e461a5b5649bd7d4c67602b8cb747961ea 
  ql/src/test/results/clientpositive/archive_excludeHadoop20.q.out 
c2b98727d21f4990ae7496a0a8fa9ac16598f4c0 
  ql/src/test/results/clientpositive/archive_multi.q.out 
0ad29d122153bd4adf4d19064188b0c4f94e05ab 
  ql/src/test/results/clientpositive/authorization_explain.q.java1.7.out 
a9ed0495fcecadbddf1fcfb764e916fbb5406662 
  ql/src/test/results/clientpositive/auto_join33.q.out 
b0b3019d5c7a6ff6058b5bfd7c965257f8850367 
  ql/src/test/results/clientpositive/auto_join8.q.out 
324f95d550add0ead3215bbdd0933ddd6456f9c9 
  ql/src/test/results/clientpositive/auto_join_filters.q.out 
2fdf470036e0df898ad2986f3a26628e6e6bba44 
  ql/src/test/results/clientpositive/auto_join_nulls.q.out 
4af5535f4a0a9c07aca7342a0a31ddb9c9b

[GitHub] hive pull request: HIVE-11417. Move the ReaderImpl and the RecordR...

2016-05-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/72


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-13812) Change column from float to string group type will drop some factional digits

2016-05-20 Thread Takahiko Saito (JIRA)
Takahiko Saito created HIVE-13812:
-

 Summary: Change column from float to string group type will drop 
some factional digits
 Key: HIVE-13812
 URL: https://issues.apache.org/jira/browse/HIVE-13812
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Takahiko Saito


Create a table with a float column and insert some values:
{noformat}
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> create table test(f float);
No rows affected (0.237 seconds)
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> insert into table test 
values(-35664.76171875),(29497.349609375);
INFO  : Session is already open
INFO  : Dag name: insert into table tes...5),(29497.349609375)(Stage-1)
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id 
application_1463771904371_0006)

INFO  : Map 1: 0/1
INFO  : Map 1: 0/1
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table default.test from 
hdfs://os-r6-pxwhrs-hiveserver2-3re-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-05-20_21-06-29_377_6487823927119226603-10/-ext-1
INFO  : Table default.test stats: [numFiles=1, numRows=2, totalSize=19, 
rawDataSize=17]
No rows affected (11.069 seconds)
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> select * from test;
+--+--+
|  test.f  |
+--+--+
| -35664.76171875  |
| 29497.349609375  |
+--+--+
2 rows selected (0.137 seconds)
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> describe test;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| f | float  |  |
+---++--+--+
1 row selected (0.173 seconds)
{noformat}

Then change float type to string successfully, but when you select table, some 
fractional digits are lost:
{noformat}
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> alter table test change column f 
f string;
No rows affected (0.214 seconds)
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> describe test;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| f | string |  |
+---++--+--+
1 row selected (0.151 seconds)
0: jdbc:hive2://os-r6-pxwhrs-hiveserver2-3re-> select * from test;
++--+
|   test.f   |
++--+
| -35664.76  |
| 29497.35   |
++--+
2 rows selected (0.141 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13811) Constant not removed in index_auto_unused.q.out

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13811:
--

 Summary: Constant not removed in index_auto_unused.q.out
 Key: HIVE-13811
 URL: https://issues.apache.org/jira/browse/HIVE-13811
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up on HIVE-13068.

In test file ql/src/test/results/clientpositive/index_auto_unused.q.out.

After HIVE-13068 goes in, the following filter is not folded after 
PartitionPruning is done:
{{filterExpr: ((ds = '2008-04-09') and (12.0 = 12.0) and (UDFToDouble(key) < 
10.0)) (type: boolean)}}

This needs further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13810) insert overwrite select from some table fails throwing org.apache.hadoop.security.AccessControlException

2016-05-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-13810:


 Summary: insert overwrite select from some table fails throwing 
org.apache.hadoop.security.AccessControlException
 Key: HIVE-13810
 URL: https://issues.apache.org/jira/browse/HIVE-13810
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan



{noformat} CREATE  EXTERNAL TABLE Batters_txt(
   Player STRING ,
   Team STRING ,
   League STRING ,
   Year SMALLINT,
   Games DOUBLE,
   AB DOUBLE,
   R DOUBLE,
   H DOUBLE,
   Doubles DOUBLE,
   Triples DOUBLE,
   HR DOUBLE,
   RBI DOUBLE,
   SB DOUBLE,
   CS DOUBLE,
   BB DOUBLE,
   SO DOUBLE,
   IBB DOUBLE,
   HBP DOUBLE,
   SH DOUBLE,
   SF DOUBLE,
   GIDP DOUBLE
 )
 location '/user/tableau/Batters';
 drop table if exists Batters;
 CREATE TABLE Batters (
   Player STRING ,
   Team STRING ,
   League STRING ,
   Year SMALLINT,
   Games DOUBLE,
   AB DOUBLE,
   R DOUBLE,
   H DOUBLE,
   Doubles DOUBLE,
   Triples DOUBLE,
   HR DOUBLE,
   RBI DOUBLE,
   SB DOUBLE,
   CS DOUBLE,
   BB DOUBLE,
   SO DOUBLE,
   IBB DOUBLE,
   HBP DOUBLE,
   SH DOUBLE,
   SF DOUBLE,
   GIDP DOUBLE
   )
 STORED AS orc tblproperties ("orc.compress"="SNAPPY");
 insert overwrite table Batters select * from Batters_txt;
{noformat}

runs into the following error:
{code}
2016-05-18T19:59:00,883 ERROR [HiveServer2-Background-Pool: Thread-306]: 
operation.Operation (:()) - Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. 
org.apache.hadoop.security.AccessControlException: User does not belong to hdfs
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:88)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1706)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:818)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:472)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:644)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2273)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2267)

at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90)
at 
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:290)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at 
org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:303)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.security.AccessControlException: User does not belong to hdfs
at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:88)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1706)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:818)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:472)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtoco

Re: Review Request 47419: enable merging of bit vectors for insert into

2016-05-20 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47419/
---

(Updated May 20, 2016, 6:57 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Address ashutosh's comments. Temporarily remove autoColumnStatsGathering_2.q 
for tez cli driver. Will add it back after HIVE-13773 is resolved.


Repository: hive-git


Description
---

HIVE-13566


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4cfa5f1 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
 9fbbd4c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
3b6cbce 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 96ef20d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 3a226e7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 7162c08 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 4049f40 
  ql/src/test/queries/clientpositive/autoColumnStats_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/autoColumnStats_8.q PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/autoColumnStats_8.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/47419/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

2016-05-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13809:


 Summary: Hybrid Grace Hash Join memory usage estimation didn't 
take into account the bloom filter size
 Key: HIVE-13809
 URL: https://issues.apache.org/jira/browse/HIVE-13809
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Memory estimation is important during hash table loading, because we need to 
make the decision of whether to load the next hash partition in memory or spill 
it. If the assumption is there's enough memory but it turns out not the case, 
we will run into OOM problem.

Currently hybrid grace hash join memory usage estimation didn't take into 
account the bloom filter size. In large test cases (TB scale) the bloom filter 
grows as big as hundreds of MB, big enough to cause estimation error.

The solution is to count in the bloom filter size into memory estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13808) Use constant expressions to backtrack when we create ReduceSink

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13808:
--

 Summary: Use constant expressions to backtrack when we create 
ReduceSink
 Key: HIVE-13808
 URL: https://issues.apache.org/jira/browse/HIVE-13808
 Project: Hive
  Issue Type: Sub-task
  Components: Parser
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

When we create a RS with constant expressions as keys/values, and immediately 
after we create a SEL operator that backtracks the expressions from the RS. 
Currently, we automatically create references for all the keys/values.

Before, we could rely on Hive ConstantPropagate to propagate the constants to 
the SEL. However, after HIVE-13068, Hive ConstantPropagate does not get 
exercised anymore. Thus, we can simply create constant expressions when we 
create the SEL operator instead of a reference.

Ex. ql/src/test/results/clientpositive/vector_coalesce.q.out

{noformat}
EXPLAIN SELECT cdouble, cstring1, cint, cfloat, csmallint, coalesce(cdouble, 
cstring1, cint, cfloat, csmallint) as c
FROM alltypesorc
WHERE (cdouble IS NULL)
ORDER BY cdouble, cstring1, cint, cfloat, csmallint, c
LIMIT 10
{noformat}

Plan:
{noformat}
EXPLAIN SELECT cdouble, cstring1, cint, cfloat, csmallint, coalesce(cdouble, 
cstring1, cint, cfloat, csmallint) as c
FROM alltypesorc
WHERE (cdouble IS NULL)
ORDER BY cdouble, cstring1, cint, cfloat, csmallint, c
LIMIT 10
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: alltypesorc
Statistics: Num rows: 12288 Data size: 2641964 Basic stats: 
COMPLETE Column stats: NONE
Filter Operator
  predicate: cdouble is null (type: boolean)
  Statistics: Num rows: 6144 Data size: 1320982 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: cstring1 (type: string), cint (type: int), cfloat 
(type: float), csmallint (type: smallint), 
COALESCE(null,cstring1,cint,cfloat,csmallint) (type: string)
outputColumnNames: _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 6144 Data size: 1320982 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: null (type: double), _col1 (type: string), 
_col2 (type: int), _col3 (type: float), _col4 (type: smallint), _col5 (type: 
string)
  sort order: ++
  Statistics: Num rows: 6144 Data size: 1320982 Basic stats: 
COMPLETE Column stats: NONE
  TopN Hash Memory Usage: 0.1
  Execution mode: vectorized
  Reduce Operator Tree:
Select Operator
  expressions: KEY.reducesinkkey0 (type: double), KEY.reducesinkkey1 
(type: string), KEY.reducesinkkey2 (type: int), KEY.reducesinkkey3 (type: 
float), KEY.reducesinkkey4 (type: smallint), KEY.reducesinkkey5 (type: string)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
  Statistics: Num rows: 6144 Data size: 1320982 Basic stats: COMPLETE 
Column stats: NONE
  Limit
Number of rows: 10
Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 10 Data size: 2150 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: 10
  Processor Tree:
ListSink
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13807) Extend metadata provider to pull up predicates through Union

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13807:
--

 Summary: Extend metadata provider to pull up predicates through 
Union
 Key: HIVE-13807
 URL: https://issues.apache.org/jira/browse/HIVE-13807
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

Currently, when we pull up predicates through Union operation with Calcite 
metadata provider, we just create a single conjunction of pulled up predicates. 

E.g. Assume operators {{I1, I2, I3}} with predicates {{P1, P2, P3}} that can be 
pulled up through them, respectively.
For an operation _Union (I1, I2, I3)_ we infer a new predicate {{Pu}}, such 
that {{Pu = P1 OR P2 OR P3}}.
While this is correct, we miss some chances for simplification e.g. if there 
are common factors in P1, P2, and P3. Further, this inference differs slightly 
from the way that the metadata provider pulls up predicates for other 
operators, thus breaking some assumptions and missing some optimization 
opportunities.

Ex. ql/src/test/results/clientpositive/input26.q.out

{noformat}
explain
select * from (
  select * from (select * from srcpart a where a.ds = '2008-04-08' and a.hr = 
'11' order by a.key limit 5)pa
union all
  select * from (select * from srcpart b where b.ds = '2008-04-08' and b.hr = 
'14' limit 5)pb
)subq
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13806) Extension to folding NOT expressions in CBO

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13806:
--

 Summary: Extension to folding NOT expressions in CBO
 Key: HIVE-13806
 URL: https://issues.apache.org/jira/browse/HIVE-13806
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

Extension to folding expressions for NOT.

Currently, simplification is performed only if NOT is applied on a simple 
operation (e.g. IS NOT NULL, =, <>, etc.). We should take advantage of NOT 
distributivity when it is applied on OR/AND operations to try to simplify 
predicates further.

Ex. ql/src/test/results/clientpositive/folder_predicate.q.out

{noformat}
explain
SELECT * FROM predicate_fold_tb WHERE not(value IS NOT NULL AND value = 3)
{noformat}

Plan:
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: predicate_fold_tb
Statistics: Num rows: 6 Data size: 7 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (not (value is not null and (value = 3))) (type: 
boolean)
  Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: value (type: int)
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 3 Data size: 3 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13805) Extend HiveSortLimitPullUpConstantsRule to pull up constants even when SortLimit is the root of the plan

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13805:
--

 Summary: Extend HiveSortLimitPullUpConstantsRule to pull up 
constants even when SortLimit is the root of the plan
 Key: HIVE-13805
 URL: https://issues.apache.org/jira/browse/HIVE-13805
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

Limitation in the original HiveSortLimitPullUpConstantsRule rule.

Currently Calcite rule does not pull-up constants when the Sort/Limit operator 
is on top of the operator tree, as this was causing Hive limit related 
optimizations to not kick in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13804) Propagate constant expressions through insert

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13804:
--

 Summary: Propagate constant expressions through insert
 Key: HIVE-13804
 URL: https://issues.apache.org/jira/browse/HIVE-13804
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

The problem is that CBO optimizes the select query and then the insert part of 
the query is attached; after HIVE-13068, ConstantPropagate in Hive does not 
kick in anymore because CBO optimized the plan, thus we may miss opportunity to 
propagate constant till the top of the plan.

Ex. ql/src/test/results/clientpositive/cp_sel.q.out

{noformat}
insert overwrite table testpartbucket partition(ds,hr) select key,value,'hello' 
as ds, 'world' as hr from srcpart where hr=11;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13803) More aggressive inference of transitive predicates for inner joins

2016-05-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-13803:
--

 Summary: More aggressive inference of transitive predicates for 
inner joins
 Key: HIVE-13803
 URL: https://issues.apache.org/jira/browse/HIVE-13803
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Follow-up of HIVE-13068.

Currently for inner joins, we do not infer transitive predicates that do not 
reference any of the columns of the input. These predicates can be evaluated 
statically and can be useful to quickly discard intermediary results.

Ex. ql/src/test/results/clientpositive/constprog3.q.out

{noformat}
explain
select table1.id, table1.val, table1.val1
from table1 inner join table3
on table1.dimid = table3.id and table3.id = 1 where table1.dimid <> 1
{noformat}

Current plan:
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: table1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
Filter Operator
  predicate: false (type: boolean)
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
  Select Operator
expressions: id (type: int), val (type: int), val1 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: int), 
_col2 (type: int)
  TableScan
alias: table3
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
Filter Operator
  predicate: (id = 1) (type: boolean)
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
  Select Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
  Reduce Operator Tree:
Join Operator
  condition map:
   Inner Join 0 to 1
  keys:
0 
1 
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 46690: HIVE-13068

2016-05-20 Thread Jesús Camacho Rodríguez


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/dynamic_rdd_cache.q.out, line 1070
> > 
> >
> > Here we lost propgation. Already covered with one of the follow-ups?

We did not; actually it was pruned, as we are grouping by a constant (which is 
nice!). Observe that '3' appears at the end of the plan.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/insert_into5.q.out, line 52
> > 
> >
> > Shuffling extra columns. Covered with follow-up jiras?

Same as above.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/folder_predicate.q.out, line 40
> > 
> >
> > Number of expression evaluation increased to 4 from 3. Possible to 
> > bring it down?

Extension to folding expressions for NOT. Currently, simplification is 
performed only for one operand. I will cover it in follow-up, as once again, it 
needs a bit of thinking (probably best way is to apply NOT distributivity 
first).


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/input26.q.out, line 47
> > 
> >
> > Here also we lost propagation. Covered with one of the follow-ups?

I know we are deferring quite a bit for follow-up, but I would like to do it 
with this one too. We need a more elaborated method to pullUpPredicates through 
the Union so we can get this constant. I will create another JIRA for this.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/vector_decimal_2.q.out, line 919
> > 
> >
> > Is this change correct?

Widening cast, it is fine.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/vector_decimal_2.q.out, line 1023
> > 
> >
> > Is this change correct?

Widening cast, it is fine.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/vector_decimal_2.q.out, line 1075
> > 
> >
> > Is this change correct?

Widening cast, it is fine.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/vector_coalesce.q.out, line 209
> > 
> >
> > No propagation across RS?

This is interesting: when we create the RS, we might create key/value 
constants, but when we create the backtrack SELECT, we always use references. 
We should use constants if the key/values are constants. Follow-up...


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/union_view.q.out, line 716
> > 
> >
> > Extra (constant) column for shuffle. Propagation broken?

Extension for inference of predicates through Union needed (mentioned before).


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/mergejoin.q.out, line 2702
> > 
> >
> > Extra partition retrieved for execution?

This is a result of too aggressive inference of preds through outerjoin, as 
stated in one of the cases before. The case needs to be further studied, but 
this is a fix.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/rand_partitionpruner3.q.out, line 156
> > 
> >
> > 4 expression evaluations instead of 3.

Covered by one of the follow-up JIRAs (NOT distributivity).


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out,
> >  line 2767
> > 
> >
> > Always true?

Fixed. Not only that, but static partition pruning is kicking in now for a 
couple of queries (nice!). Could you check?


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/vector_decimal_round_2.q.out, line 441
> > 
> >
> > Extra columns for shuffle.

Pull-up through Sort/Limit when it is on top of the plan; covered by one of the 
follow-up JIRAs.


> On May 18, 2016, 8:56 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientposi

[jira] [Created] (HIVE-13802) Built-in aggregate functions may produce incorrect values when all values being aggregated in a group are NULL as the result of an expression

2016-05-20 Thread Laurent Martin (JIRA)
Laurent Martin created HIVE-13802:
-

 Summary: Built-in aggregate functions may produce incorrect values 
when all values being aggregated in a group are NULL as the result of an 
expression
 Key: HIVE-13802
 URL: https://issues.apache.org/jira/browse/HIVE-13802
 Project: Hive
  Issue Type: Bug
  Components: Hive, ORC, Tez
Affects Versions: 0.14.0
Reporter: Laurent Martin


With the Tez engine and Hive tables stored as ORC, built-in aggregate functions 
may produce incorrect values when all values being aggregated in a group are 
NULL as the result of an expression.

-- Test 1
-- The S column is populated as NaN

SET hive.execution.engine=tez; 
CREATE TABLE LM
(
D STRING,
X DOUBLE
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;

-- Test 2
-- The S column will be populated as 1 (dangerous case!)

SET hive.execution.engine=tez; 
CREATE TABLE LM
(
D STRING,
X INT
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;

-- Workaound:
-- According to my tests, a workaround is to surround the nullable expression 
with
-- COALESCE. Example:

CREATE TABLE LM
(
D STRING,
X INT
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(COALESCE(X + 3)) AS S
FROM LM
GROUP BY D;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)