[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-20 Thread pengcheng xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Attachment: HIVE-7654.4.patch

reduce # of queries follow Ashutosh's comments

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.

2014-08-20 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24498/
---

(Updated Aug. 20, 2014, 6:52 a.m.)


Review request for hive.


Changes
---

reduce # of queries follow Ashutosh's comments


Repository: hive-git


Description
---

We propose a method to extrapolate the missing column status for the partitions.


Diffs (updated)
-

  data/files/extrapolate_stats_full.txt PRE-CREATION 
  data/files/extrapolate_stats_partial.txt PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
84ef5f9 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
767cffc 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a9f4be2 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 4eba2b0 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 78ab19a 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q PRE-CREATION 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/24498/diff/


Testing
---


File Attachments


HIVE-7654.0.patch
  
https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch


Thanks,

pengcheng xiong



[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-08-20 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103518#comment-14103518
 ] 

Szehon Ho commented on HIVE-7254:
-

I think Brock was working on the PTest server and made some config changes 
recently.  [~brocknoland] can you take a look?

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Summary: Extend ReadEntity to add column access information from query  
(was: Get instance of HiveSemanticAnalyzerHookContext from configuration)

 Extend ReadEntity to add column access information from query
 -

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).
 So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103519#comment-14103519
 ] 

Thejas M Nair commented on HIVE-4629:
-

[~dongc] Earlier patch also had a method in HiveStatement to get the log. I 
think that will be convenient for many users, though we need to be careful and 
specify that is the only non jdbc function that is part of a public API in it. 
But this can also be done as follow up work in separate jira.


 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-


  was:
Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have 
hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).
So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.


 Extend ReadEntity to add column access information from query
 -

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Summary: Extend ReadEntity to add accessed columns from query  (was: Extend 
ReadEntity to add column access information from query)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---

Description: 
I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.

  was:
I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This first patch enable LOCKS on metastore.


 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---

Attachment: HIVE-7889.2.patch

Rebased and add more features.

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This first patch enable LOCKS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);

if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-



 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
 }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);

if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-08-20 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103541#comment-14103541
 ] 

Szehon Ho commented on HIVE-7254:
-

I think I see what needs to be done, its on the build machine's configuration 
(not checked in) and needs to add back the other set of tez tests that Brock 
accidentally removed.  As I'm afraid of hosing the builds tonight, Brock or I 
can do it tomorrow morning :)

Hey Brock, one idea, do you think its a good idea to add the build machine's 
properties to source control?  That way there is a history in case they get 
changed, and all devs can easily see/modify without having to login to build 
machine.  Just a late night thought.

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set  
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set  
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set  
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7796) Provide subquery pushdown facility for storage handlers

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103560#comment-14103560
 ] 

Hive QA commented on HIVE-7796:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662985/HIVE-7796.1.patch.txt

{color:green}SUCCESS:{color} +1 6008 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/416/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/416/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-416/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662985

 Provide subquery pushdown facility for storage handlers
 ---

 Key: HIVE-7796
 URL: https://issues.apache.org/jira/browse/HIVE-7796
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7796.1.patch.txt


 If underlying storage can handle basic filtering or aggregation, hive can 
 delegate execution of whole subquery to the storage and handle it as a simple 
 scanning operation.
 Experimentally implemented on JDBC / Phoenix handler and seemed working good. 
 Hopefully open the code for those too, but it's not allowed to me yet.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
we set HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
 we set HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103568#comment-14103568
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Hi [~ashutoshc]
Currently Hive has a new interface for external authorization plugin and 
semantic hook may be replaced in the future. So I will try to put accessed 
columns to ReadEntity instread of enhancing semantic hook.This way will be 
available to hooks as well as authorization interfaces. I have updated the 
description, and wait for you feedback. Thanks!

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
 we set HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7797) upgrade sql 014-HIVE-3764.postgres.sql failed

2014-08-20 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-7797:
---

 Summary: upgrade sql 014-HIVE-3764.postgres.sql failed
 Key: HIVE-7797
 URL: https://issues.apache.org/jira/browse/HIVE-7797
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Nemon Lou


The sql is :
INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES 
(1, '', 'Initial value');

And the result is:
ERROR:  null value in column SCHEMA_VERSION violates not-null constraint
DETAIL:  Failing row contains (1, null, Initial value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
we set HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column list to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103624#comment-14103624
 ] 

Hive QA commented on HIVE-7654:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663065/HIVE-7654.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6010 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/417/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/417/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-417/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663065

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()

2014-08-20 Thread SUYEON LEE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SUYEON LEE reassigned HIVE-7754:


Assignee: SUYEON LEE

 Potential null pointer dereference in ColumnTruncateMapper#jobClose()
 -

 Key: HIVE-7754
 URL: https://issues.apache.org/jira/browse/HIVE-7754
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor

 {code}
 Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, 
 null,
   reporter);
 {code}
 null is passed to Utilities.mvFileToFinalPath() which gets passed to 
 createEmptyBuckets() where:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()

2014-08-20 Thread SUYEON LEE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SUYEON LEE reassigned HIVE-7754:


Assignee: SUYEON LEE  (was: KangHS)

 Potential null pointer dereference in ColumnTruncateMapper#jobClose()
 -

 Key: HIVE-7754
 URL: https://issues.apache.org/jira/browse/HIVE-7754
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor

 {code}
 Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, 
 null,
   reporter);
 {code}
 null is passed to Utilities.mvFileToFinalPath() which gets passed to 
 createEmptyBuckets() where:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7754) Potential null pointer dereference in ColumnTruncateMapper#jobClose()

2014-08-20 Thread KangHS (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KangHS reassigned HIVE-7754:


Assignee: KangHS  (was: SUYEON LEE)

 Potential null pointer dereference in ColumnTruncateMapper#jobClose()
 -

 Key: HIVE-7754
 URL: https://issues.apache.org/jira/browse/HIVE-7754
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: KangHS
Priority: Minor

 {code}
 Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, 
 null,
   reporter);
 {code}
 null is passed to Utilities.mvFileToFinalPath() which gets passed to 
 createEmptyBuckets() where:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent

2014-08-20 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7599:
--

Status: Patch Available  (was: Open)

 NPE in MergeTask#main() when -format is absent
 --

 Key: HIVE-7599
 URL: https://issues.apache.org/jira/browse/HIVE-7599
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7599.patch


 When '-format' is absent from commandline, the following call would result in 
 NPE (format is initialized to null):
 {code}
 if (format.equals(rcfile)) {
   mergeWork = new MergeWork(inputPaths, new Path(outputDir), 
 RCFileInputFormat.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent

2014-08-20 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7599:
--

Attachment: HIVE-7599.patch

When the format object is null, the printUsage() method will be called.

 NPE in MergeTask#main() when -format is absent
 --

 Key: HIVE-7599
 URL: https://issues.apache.org/jira/browse/HIVE-7599
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7599.patch


 When '-format' is absent from commandline, the following call would result in 
 NPE (format is initialized to null):
 {code}
 if (format.equals(rcfile)) {
   mergeWork = new MergeWork(inputPaths, new Path(outputDir), 
 RCFileInputFormat.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
   // TODO: we can put accessed column list to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7798) Authentication tokens lost in a UDTF on a secure cluster

2014-08-20 Thread JIRA
Rémy SAISSY created HIVE-7798:
-

 Summary: Authentication tokens lost in a UDTF on a secure cluster
 Key: HIVE-7798
 URL: https://issues.apache.org/jira/browse/HIVE-7798
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.13.0
Reporter: Rémy SAISSY


Context:
 - Secure Cluster running Hive 0.13, Hadoop 2.4 and HBase 0.98 (HDP 2.1)
 - UDTF written in Java

Action:
In the UDTF, HBase is contacted through its Java API in order to add a few 
records. However any requests to HBase fails because tokens are not passed to 
the call to HBase.

Executing the following code in the UDTF:
Configuration conf = HBaseConfiguration.create();
UserGroupInformation.setConfiguration(conf);
HTable  hbaseErrorTable = new HTable(conf, foo :foo);

Leads to this error:
2014-07-22 14:44:04,134 DEBUG [main] org.apache.hadoop.ipc.RpcClient: 
Connecting to node2.cluster.fr/10.197.40.54:60020
2014-07-22 14:44:04,135 DEBUG [main] 
org.apache.hadoop.security.UserGroupInformation: PrivilegedAction 
as:expecteduser (auth:SIMPLE) 
from:org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:915)
2014-07-22 14:44:04,135 DEBUG [main] 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient: Creating SASL GSSAPI 
client. Server's Kerberos principal name is hbase/node2.cluster.fr@REALM
2014-07-22 14:44:04,137 DEBUG [main] 
org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException 
as:expecteduser (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)]
2014-07-22 14:44:04,138 DEBUG [main] 
org.apache.hadoop.security.UserGroupInformation: PrivilegedAction 
as:expecteduser (auth:SIMPLE) 
from:org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:818)
2014-07-22 14:44:04,138 WARN [main] org.apache.hadoop.ipc.RpcClient: Exception 
encountered while connecting to the server : javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)]
2014-07-22 14:44:04,138 FATAL [main] org.apache.hadoop.ipc.RpcClient: SASL 
authentication failed. The most likely cause is missing or invalid credentials. 
Consider 'kinit'.

The workaround is to add the following in the UDTF before actually contacting 
HBase:

public static void logFromKeytabAndLogoutCurrentUser(String user, String path) 
throws IOException
{
//UserGroupInformation.loginUserFromKeytab(expecteduser@REALM, 
/etc/security/keytabs/expecteduser.headless.keytab);
UserGroupInformation.loginUserFromKeytab(user, path);
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
subject.getPrincipals().clear();
subject.getPrivateCredentials().clear();
subject.getPublicCredentials().clear();
}

However, it implies to have the keytab to perform a new authentication from 
inside the UDTF.

I'm not sure wether this bug is related to Hive UDTF or to YARN Containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103688#comment-14103688
 ] 

Hive QA commented on HIVE-7689:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663067/HIVE-7889.2.patch

{color:green}SUCCESS:{color} +1 6008 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/418/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/418/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-418/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663067

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-20 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7799:
---

 Summary: TRANSFORM failed in transform_ppr1.q[Spark Branch]
 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-20 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Description: 
Here is the exception:
{noformat}
2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - 
Exception in task 0.0 in stage 1.0 (TID 0)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{noformat}

Basically, the cause is that RowContrainer is misused(it's not allowed to write 
once someone read row from it), i'm trying to figure out whether it's a hive 
issue or just in hive on spark mode.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1

 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContrainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails

2014-08-20 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103706#comment-14103706
 ] 

Venki Korukanti commented on HIVE-7747:
---

Test failure here is related to the change. Failure is complicated. It turns 
out that output of {{HiveConf(srcHiveConf, SessionState.class)}} is not same as 
srcHiveConf in terms of (property, value) pairs. Executed as part of 
constructor, the {{HiveConf.initialize}} method applies system properties on 
top of copied properties from srcHiveConf. So from the moment srcHiveConf is 
created to the moment of cloning HiveConf if there are any System properties 
set, cloned HiveConf inherits those properties. In the test case ({{MiniHS2}}) 
scratchdir property is modified in System properties (See 
[here|https://github.com/apache/hive/blob/trunk/itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L184]),
 but the default scratchdir value is {{$\{test.tmp.dir\}/scratchdir}} from 
hive-site.xml. Scrathdir set in {{MiniHS2}} is never used before, but with this 
change HS2 started using it. Scratchdir created in {{MiniHS2}} (See 
[here|https://github.com/apache/hive/blob/trunk/itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L183])
 doesn't have 777 permissions, so whenever we have user impersonation there are 
issues (thats where the test is failing). Before this change, scratchdir is 
always {{$\{test.tmp.dir\}/scratchdir}} which is created in HS2 with 777 
permissions (See 
[here|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3458]),
 so there were no issues with the impersonation. 

I think it is better to fix this in SparkClient by fetching the jar directly 
than through HiveConf, to avoid unexpected issues.



 Submitting a query to Spark from HiveServer2 fails
 --

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Attachments: HIVE-7747.1.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
--

Summary: Submitting a query to Spark from HiveServer2 fails [Spark Branch]  
(was: Submitting a query to Spark from HiveServer2 fails)

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
--

Fix Version/s: spark-branch

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
--

Attachment: HIVE-7747.2-spark.patch

Attaching v2 patch specific to spark-branch.

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
--

Affects Version/s: (was: 0.13.1)
   spark-branch

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7599) NPE in MergeTask#main() when -format is absent

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103746#comment-14103746
 ] 

Hive QA commented on HIVE-7599:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663086/HIVE-7599.patch

{color:green}SUCCESS:{color} +1 6008 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/419/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/419/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-419/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663086

 NPE in MergeTask#main() when -format is absent
 --

 Key: HIVE-7599
 URL: https://issues.apache.org/jira/browse/HIVE-7599
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7599.patch


 When '-format' is absent from commandline, the following call would result in 
 NPE (format is initialized to null):
 {code}
 if (format.equals(rcfile)) {
   mergeWork = new MergeWork(inputPaths, new Path(outputDir), 
 RCFileInputFormat.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end

2014-08-20 Thread Damien Carol

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24602/
---

(Updated août 20, 2014, 10:53 matin)


Review request for hive.


Changes
---

Updated with patch V2 that enable ALL feature of a Metastore Backend


Bugs: HIVE-7689
https://issues.apache.org/jira/browse/HIVE-7689


Repository: hive-git


Description
---

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This first patch enable LOCKS on metastore.


Diffs (updated)
-

  metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 
2ebd3b0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 524a7a4 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 
30cf814 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
063dee6 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java f74f683 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
f636cff 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
db62721 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 

Diff: https://reviews.apache.org/r/24602/diff/


Testing
---

Using patched version in production. Enable concurrency with DbTxnManager.


Thanks,

Damien Carol



[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-08-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7694:
--

Assignee: Suma Shivaprasad

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7629:
--

Assignee: Suma Shivaprasad

 Problem in SMB Joins between two Parquet tables
 ---

 Key: HIVE-7629
 URL: https://issues.apache.org/jira/browse/HIVE-7629
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7629.1.patch, HIVE-7629.patch


 The issue is clearly seen when two bucketed and sorted parquet tables with 
 different number of columns are involved in the join . The following 
 exception is seen
 {noformat}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
 at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, 

[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer 

[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103793#comment-14103793
 ] 

Hive QA commented on HIVE-7747:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663103/HIVE-7747.2-spark.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5958 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/66/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/66/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-66/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663103

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Need feedbacks on HIVE-7689

2014-08-20 Thread Damien Carol

Hi,

Anyone can see this ticket : HIVE-7689
https://issues.apache.org/jira/browse/HIVE-7689

Regards,
--

Damien CAROL

 * tél : +33 (0)4 74 96 88 14
 * fax : +33 (0)4 74 96 31 88
 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com

BLITZ BUSINESS SERVICE



[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104059#comment-14104059
 ] 

Larry McCay commented on HIVE-7634:
---

Are there plans to commit this to branch-2?

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7800) Parqet Column Index Access Schema Size Checking

2014-08-20 Thread Daniel Weeks (JIRA)
Daniel Weeks created HIVE-7800:
--

 Summary: Parqet Column Index Access Schema Size Checking
 Key: HIVE-7800
 URL: https://issues.apache.org/jira/browse/HIVE-7800
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks


In the case that a parquet formatted table has partitions where the files have 
different size schema, using column index access can result in an index out of 
bounds exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7800) Parqet Column Index Access Schema Size Checking

2014-08-20 Thread Daniel Weeks (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Weeks updated HIVE-7800:
---

Attachment: HIVE-7800.1.patch

 Parqet Column Index Access Schema Size Checking
 ---

 Key: HIVE-7800
 URL: https://issues.apache.org/jira/browse/HIVE-7800
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks
 Attachments: HIVE-7800.1.patch


 In the case that a parquet formatted table has partitions where the files 
 have different size schema, using column index access can result in an index 
 out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7800) Parqet Column Index Access Schema Size Checking

2014-08-20 Thread Daniel Weeks (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Weeks updated HIVE-7800:
---

Status: Patch Available  (was: Open)

Included patch is a trivial fix that simply checks for both the existence of 
the column in the parquet file as well as checking the column index position to 
make sure the file contains such a position.

In the event the check fails, the column is not included and null values are 
produced for the missing column, which is the expected behavior.

 Parqet Column Index Access Schema Size Checking
 ---

 Key: HIVE-7800
 URL: https://issues.apache.org/jira/browse/HIVE-7800
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks
 Attachments: HIVE-7800.1.patch


 In the case that a parquet formatted table has partitions where the files 
 have different size schema, using column index access can result in an index 
 out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104078#comment-14104078
 ] 

Larry McCay commented on HIVE-7634:
---

Just realized that branch-2 is a hadoop branch.

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7801) Move PTest2 properties files into svn

2014-08-20 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7801:
--

 Summary: Move PTest2 properties files into svn
 Key: HIVE-7801
 URL: https://issues.apache.org/jira/browse/HIVE-7801
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


To stop me from screwing them up :)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104086#comment-14104086
 ] 

Brock Noland commented on HIVE-7254:


Ok, I think there were three issues:

1) I accidently removed minitez.query.files.shared from the properties file 
(just added that back)
2) HIVE-7757 
3) Fixing HIVE-7757 required a restart of ptest2

I agree that we need those properties files in svn. They used to have sensitive 
information in them but now they don't. I created HIVE-7801 to fix that. 

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7802) Update language manual for insert, update, and delete

2014-08-20 Thread Alan Gates (JIRA)
Alan Gates created HIVE-7802:


 Summary: Update language manual for insert, update, and delete
 Key: HIVE-7802
 URL: https://issues.apache.org/jira/browse/HIVE-7802
 Project: Hive
  Issue Type: Sub-task
  Components: Documentation
Reporter: Alan Gates
Assignee: Alan Gates


With the addition of ACID compliant insert, insert...values, update, and delete 
we need to update the Hive language manual to cover the new features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.

2014-08-20 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24498/#review51109
---


Patch does a great job of making # of queries independent of # of columns. Good 
work! But it seems its now making queries over all partitions of table, instead 
of those listed in request.


metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
https://reviews.apache.org/r/24498/#comment89121

Now that you have fixed this TODO, you can delete it.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
https://reviews.apache.org/r/24498/#comment89114

This should contain  and PARTITION_NAME in () otherwise we are running 
query over all partitions of table, instead of those requested.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
https://reviews.apache.org/r/24498/#comment89117

This should contain  and PARTITION_NAME in () otherwise we are running 
query over all partitions of table, instead of those requested.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
https://reviews.apache.org/r/24498/#comment89118

This should contain  and PARTITION_NAME in () otherwise we are running 
query over all partitions of table, instead of those requested.


- Ashutosh Chauhan


On Aug. 20, 2014, 6:52 a.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24498/
 ---
 
 (Updated Aug. 20, 2014, 6:52 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 We propose a method to extrapolate the missing column status for the 
 partitions.
 
 
 Diffs
 -
 
   data/files/extrapolate_stats_full.txt PRE-CREATION 
   data/files/extrapolate_stats_partial.txt PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 84ef5f9 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java
  PRE-CREATION 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
  PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
 767cffc 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 a9f4be2 
   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
  4eba2b0 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
  78ab19a 
   ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24498/diff/
 
 
 Testing
 ---
 
 
 File Attachments
 
 
 HIVE-7654.0.patch
   
 https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch
 
 
 Thanks,
 
 pengcheng xiong
 




[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7654:
---

Status: Open  (was: Patch Available)

Good work in making # of sql queries independent of # of cols. Left some 
comments on RB.
I was expecting annotate_stats_part results to be updated,  because you fixed # 
of parititions for which stats were found but I was expecting them to change 
from COMPLETE to PARTIAL. Can you take a look at that as well.

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7373:
---

Labels: TODOC14  (was: )

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, 
 HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104125#comment-14104125
 ] 

Brock Noland commented on HIVE-7373:


Hi Lefty,

Great point. I added TODOC14. 

[~spena] can you come up with a good user facing statement for this one? 
Something like 

Prior to 0.14 trailing zeros on decimals were unnecessarily trimmed ...

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, 
 HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7629:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you so much for your contribution! I have committed this to trunk!

 Problem in SMB Joins between two Parquet tables
 ---

 Key: HIVE-7629
 URL: https://issues.apache.org/jira/browse/HIVE-7629
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7629.1.patch, HIVE-7629.patch


 The issue is clearly seen when two bucketed and sorted parquet tables with 
 different number of columns are involved in the join . The following 
 exception is seen
 {noformat}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
 at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104143#comment-14104143
 ] 

Sergey Shelukhin commented on HIVE-7654:


+1, conditional on Ashutosh also +1ing :)


 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-08-20 Thread david serafini (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

david serafini updated HIVE-7100:
-

Attachment: HIVE-7100.3.patch

Attached HIVE-7100.3.patch that should fix the errors in test the previous 
patch.

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

2014-08-20 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7593:
---

Status: Patch Available  (was: Open)

When ever the spark configurations are updated globally, existing session will 
be closed and new session will be created.

 Instantiate SparkClient per user session [Spark Branch]
 ---

 Key: HIVE-7593
 URL: https://issues.apache.org/jira/browse/HIVE-7593
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch


 SparkContext is the main class via which Hive talk to Spark cluster. 
 SparkClient encapsulates a SparkContext instance. Currently all user sessions 
 share a single SparkClient instance in HiveServer2. While this is good enough 
 for a POC, even for our first two milestones, this is not desirable for a 
 multi-tenancy environment and gives least flexibility to Hive users. Here is 
 what we propose:
 1. Have a SparkClient instance per user session. The SparkClient instance is 
 created when user executes its first query in the session. It will get 
 destroyed when user session ends.
 2. The SparkClient is instantiated based on the spark configurations that are 
 available to the user, including those defined at the global level and those 
 overwritten by the user (thru set command, for instance).
 3. Ideally, when user changes any spark configuration during the session, the 
 old SparkClient instance should be destroyed and a new one based on the new 
 configurations is created. This may turn out to be a little hard, and thus 
 it's a nice-to-have. If not implemented, we need to document that 
 subsequent configuration changes will not take effect in the current session.
 Please note that there is a thread-safety issue on Spark side where multiple 
 SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
 to work with Spark community to get this addressed.
 Besides above functional requirements, avoid potential issues is also a 
 consideration. For instance, sharing SC among users is bad, as resources 
 (such as jar for UDF) will be also shared, which is problematic. On the other 
 hand, one SC per job seems too expensive, as the resource needs to be 
 re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning

2014-08-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104187#comment-14104187
 ] 

Alan Gates commented on HIVE-7281:
--

I'm fine with doing that, but do we need to link that change to this?  Can we 
file a separate JIRA for that?

 DbTxnManager acquiring wrong level of lock for dynamic partitioning
 ---

 Key: HIVE-7281
 URL: https://issues.apache.org/jira/browse/HIVE-7281
 Project: Hive
  Issue Type: Bug
  Components: Locking, Transactions
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7281.patch


 Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic 
 partitioning.  But this is not adequate.  This will not prevent drop 
 operations on partitions being written to.  The lock should be at the table 
 level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7747:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you so much Venki! I have committed this to spark!

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104209#comment-14104209
 ] 

Lefty Leverenz commented on HIVE-7634:
--

Does this need any user/admin documentation?

Also, shouldn't it be marked as Fix Version 0.14.0?

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104205#comment-14104205
 ] 

Brock Noland commented on HIVE-7747:


Wow, nice analysis! +1

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch

 Attachments: HIVE-7747.1.patch, HIVE-7747.2-spark.patch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7785) CBO: Projection Pruning needs to handle cross Joins

2014-08-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7785:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks [~jpullokkaran]!

 CBO: Projection Pruning needs to handle cross Joins
 ---

 Key: HIVE-7785
 URL: https://issues.apache.org/jira/browse/HIVE-7785
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7785.patch


 Projection pruning needs to handle cross joins. 
 Ex: select r1.x from r1 join r2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7634:
-

Fix Version/s: 0.14.0

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104229#comment-14104229
 ] 

Jason Dere commented on HIVE-7634:
--

Thanks for catching that Lefty, need to set the fix version.
This could use some doc.

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104234#comment-14104234
 ] 

Brock Noland commented on HIVE-7767:


Kicked off pre-commits on this one again.

 hive.optimize.union.remove does not work properly [Spark Branch]
 

 Key: HIVE-7767
 URL: https://issues.apache.org/jira/browse/HIVE-7767
 Project: Hive
  Issue Type: Sub-task
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch


 Turing on the hive.optimize.union.remove property generates wrong union all 
 result. 
 For Example:
 {noformat}
 create table inputTbl1(key string, val string) stored as textfile;
 load data local inpath '../../data/files/T1.txt' into table inputTbl1;
 SELECT *
 FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, count(1) as values from inputTbl1 group by key
 ) a;  
 {noformat}
 when the hive.optimize.union.remove is turned on, the query result is like: 
 {noformat}
 1 1
 2 1
 3 1
 7 1
 8 2
 {noformat}
 when the hive.optimize.union.remove is turned off, the query result is like: 
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}
 The expected query result is:
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7634:
-

Labels: TODOC14  (was: )

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341

2014-08-20 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7770:
---

Attachment: HIVE-7770.1.patch

Changed HCatPartition to pre-cache {{this.sd.getCols()}} into a member 
variable. Slightly redundant, but it gets around having to change the exception 
signature of {{HCatPartition.getColumns()}}. And it amortizes the 
construction-cost for multiple calls. #silverlining

 Undo backward-incompatible behaviour change introduced by HIVE-7341
 ---

 Key: HIVE-7770
 URL: https://issues.apache.org/jira/browse/HIVE-7770
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Mithun Radhakrishnan
  Labels: regression
 Attachments: HIVE-7770.1.patch


 HIVE-7341 introduced a backward-incompatibility regression in Exception 
 signatures for HCatPartition.getColumns() that breaks compilation for 
 external tools like Falcon. This bug tracks a scrub of any other issues we 
 discover, so we can put them back to how it used to be. This bug needs 
 resolution in the same release as HIVE-7341, and thus, must be resolved in 
 0.14.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341

2014-08-20 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7770:
---

Status: Patch Available  (was: Open)

 Undo backward-incompatible behaviour change introduced by HIVE-7341
 ---

 Key: HIVE-7770
 URL: https://issues.apache.org/jira/browse/HIVE-7770
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Mithun Radhakrishnan
  Labels: regression
 Attachments: HIVE-7770.1.patch


 HIVE-7341 introduced a backward-incompatibility regression in Exception 
 signatures for HCatPartition.getColumns() that breaks compilation for 
 external tools like Falcon. This bug tracks a scrub of any other issues we 
 discover, so we can put them back to how it used to be. This bug needs 
 resolution in the same release as HIVE-7341, and thus, must be resolved in 
 0.14.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7576) Add PartitionSpec support in HCatClient API

2014-08-20 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7576:
---

Attachment: HIVE-7576.1.patch

Here's the fix. This won't apply till HIVE-7223 is resolved. 

 Add PartitionSpec support in HCatClient API
 ---

 Key: HIVE-7576
 URL: https://issues.apache.org/jira/browse/HIVE-7576
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Metastore
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7576.1.patch


 HIVE-7223 adds support for PartitionSpecs in Hive Metastore. The HCatClient 
 API must add support to fetch partitions, add partitions, etc. using 
 PartitionSpec semantics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete

2014-08-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104290#comment-14104290
 ] 

Alan Gates commented on HIVE-7646:
--

+1

This isn't a big deal and I don't think we should hold the patch for it, but I 
have a question.  If I read the spec correctly a query like:
select * from (values((1, 2, 3),(4, 5, 6));
should be legal.  At FromClauseParser.g, line 290 you are requiring a 
tableNameColList as part of the virtualTableSource.  This means the user always 
has to do a table definition after the values clause, so the above would become:
select * from (values((1, 2, 3),(4, 5, 6)) as foo(a int, b int, c int);
This makes sense since the more common case is probably:
select a, count(b) from (values((1, 2, 3),(4, 5, 6)) as foo(a int, b int, c 
int) group by a where c  4;
or something, in which case the table definition is required.  But the main 
question is am I misreading the spec or are we just adding the requirement for 
the table definition in all cases?  I think it's ok if we're adding it, as this 
is primarily for our own testing purposes.

 Modify parser to support new grammar for Insert,Update,Delete
 -

 Key: HIVE-7646
 URL: https://issues.apache.org/jira/browse/HIVE-7646
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7646.1.patch, HIVE-7646.2.patch, HIVE-7646.3.patch, 
 HIVE-7646.patch


 need parser to recognize constructs such as :
 {code:sql}
 INSERT INTO Cust (Customer_Number, Balance, Address)
 VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave');
 {code}
 {code:sql}
 DELETE FROM Cust WHERE Balance  5.0
 {code}
 {code:sql}
 UPDATE Cust
 SET column1=value1,column2=value2,...
 WHERE some_column=some_value
 {code}
 also useful
 {code:sql}
 select a,b from values((1,2),(3,4)) as FOO(a,b)
 {code}
 This makes writing tests easier.
 Some references:
 http://dev.mysql.com/doc/refman/5.6/en/insert.html
 http://msdn.microsoft.com/en-us/library/dd776382.aspx



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7001) fs.permissions.umask-mode is getting unset when Session is started

2014-08-20 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7001:
--

Attachment: TestUMask.patch

Hi [~thejas], One question regarding the fs.permissions.umask-mode. Looks 
like fs.permissions.umask-mode doesn't exist in Hadoop 1.x and property 
dfs.umaskmode is used instead in 1.x for the same purpose. Also 
dfs.umaskmode was not deprecated in 1.x according to HADOOP-8727. Should we 
use FsPermission.UMASK_LABEL instead of fs.permissions.umask-mode which 
always points to proper property in latest Hadoop in each version (0.23.x, 1.x, 
2.x)?

Attached a testcase to illustrate the problem. Test passes fine with 
-Phadoop-2, but not with -Phadoop-1.

 fs.permissions.umask-mode is getting unset when Session is started
 --

 Key: HIVE-7001
 URL: https://issues.apache.org/jira/browse/HIVE-7001
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0, 0.13.1

 Attachments: HIVE-7001.1.patch, HIVE-7001.2.patch, HIVE-7001.3.patch, 
 TestUMask.patch


 {code}
 hive set fs.permissions.umask-mode;
 fs.permissions.umask-mode=022
 hive show tables;
 OK
 t1
 Time taken: 0.301 seconds, Fetched: 1 row(s)
 hive set fs.permissions.umask-mode;
 fs.permissions.umask-mode is undefined
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104308#comment-14104308
 ] 

Hive QA commented on HIVE-7593:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662806/HIVE-7593.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5958 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/67/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/67/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-67/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662806

 Instantiate SparkClient per user session [Spark Branch]
 ---

 Key: HIVE-7593
 URL: https://issues.apache.org/jira/browse/HIVE-7593
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch


 SparkContext is the main class via which Hive talk to Spark cluster. 
 SparkClient encapsulates a SparkContext instance. Currently all user sessions 
 share a single SparkClient instance in HiveServer2. While this is good enough 
 for a POC, even for our first two milestones, this is not desirable for a 
 multi-tenancy environment and gives least flexibility to Hive users. Here is 
 what we propose:
 1. Have a SparkClient instance per user session. The SparkClient instance is 
 created when user executes its first query in the session. It will get 
 destroyed when user session ends.
 2. The SparkClient is instantiated based on the spark configurations that are 
 available to the user, including those defined at the global level and those 
 overwritten by the user (thru set command, for instance).
 3. Ideally, when user changes any spark configuration during the session, the 
 old SparkClient instance should be destroyed and a new one based on the new 
 configurations is created. This may turn out to be a little hard, and thus 
 it's a nice-to-have. If not implemented, we need to document that 
 subsequent configuration changes will not take effect in the current session.
 Please note that there is a thread-safety issue on Spark side where multiple 
 SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
 to work with Spark community to get this addressed.
 Besides above functional requirements, avoid potential issues is also a 
 consideration. For instance, sharing SC among users is bad, as resources 
 (such as jar for UDF) will be also shared, which is problematic. On the other 
 hand, one SC per job seems too expensive, as the resource needs to be 
 re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-7682:
--

Assignee: Brock Noland  (was: Sergio Peña)

 HadoopThriftAuthBridge20S should not reset configuration unless required
 

 Key: HIVE-7682
 URL: https://issues.apache.org/jira/browse/HIVE-7682
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7682.1.patch


 In HadoopThriftAuthBridge20S methods createClientWithConf and 
 getCurrentUGIWithConf we create new Configuration objects so we can set the 
 authentication type. When loading the new Configuration object, it looks like 
 core-site.xml for the cluster it's connected to.
 This causes issues for Oozie since oozie does not have access to the 
 core-site.xml as it's cluster agnostic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7682:
---

Status: Patch Available  (was: Open)

 HadoopThriftAuthBridge20S should not reset configuration unless required
 

 Key: HIVE-7682
 URL: https://issues.apache.org/jira/browse/HIVE-7682
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Sergio Peña
 Attachments: HIVE-7682.1.patch


 In HadoopThriftAuthBridge20S methods createClientWithConf and 
 getCurrentUGIWithConf we create new Configuration objects so we can set the 
 authentication type. When loading the new Configuration object, it looks like 
 core-site.xml for the cluster it's connected to.
 This causes issues for Oozie since oozie does not have access to the 
 core-site.xml as it's cluster agnostic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required

2014-08-20 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7682:
---

Attachment: HIVE-7682.1.patch

 HadoopThriftAuthBridge20S should not reset configuration unless required
 

 Key: HIVE-7682
 URL: https://issues.apache.org/jira/browse/HIVE-7682
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Sergio Peña
 Attachments: HIVE-7682.1.patch


 In HadoopThriftAuthBridge20S methods createClientWithConf and 
 getCurrentUGIWithConf we create new Configuration objects so we can set the 
 authentication type. When loading the new Configuration object, it looks like 
 core-site.xml for the cluster it's connected to.
 This causes issues for Oozie since oozie does not have access to the 
 core-site.xml as it's cluster agnostic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104312#comment-14104312
 ] 

Brock Noland commented on HIVE-7682:


I talked with Sergio offline and I am going to grab this one.

 HadoopThriftAuthBridge20S should not reset configuration unless required
 

 Key: HIVE-7682
 URL: https://issues.apache.org/jira/browse/HIVE-7682
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7682.1.patch


 In HadoopThriftAuthBridge20S methods createClientWithConf and 
 getCurrentUGIWithConf we create new Configuration objects so we can set the 
 authentication type. When loading the new Configuration object, it looks like 
 core-site.xml for the cluster it's connected to.
 This causes issues for Oozie since oozie does not have access to the 
 core-site.xml as it's cluster agnostic.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24903: HIVE-7682: HadoopThriftAuthBridge20S should not reset configuration unless required

2014-08-20 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24903/
---

Review request for hive.


Repository: hive-git


Description
---

Described in JIRA


Diffs
-

  
shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java
 8b9da7a 

Diff: https://reviews.apache.org/r/24903/diff/


Testing
---


Thanks,

Brock Noland



[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104338#comment-14104338
 ] 

Alan Gates commented on HIVE-7689:
--

It looks like these changes are all to change the SQL to be uppercase and quote 
all identifiers.  What issues are you seeing to drives the need for this?   
Have you tested it against any other RDBMSs?

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7001) fs.permissions.umask-mode is getting unset when Session is started

2014-08-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104341#comment-14104341
 ] 

Thejas M Nair commented on HIVE-7001:
-

Using FsPermission.UMASK_LABEL sounds good to me .
Please open a new jira.


 fs.permissions.umask-mode is getting unset when Session is started
 --

 Key: HIVE-7001
 URL: https://issues.apache.org/jira/browse/HIVE-7001
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0, 0.13.1

 Attachments: HIVE-7001.1.patch, HIVE-7001.2.patch, HIVE-7001.3.patch, 
 TestUMask.patch


 {code}
 hive set fs.permissions.umask-mode;
 fs.permissions.umask-mode=022
 hive show tables;
 OK
 t1
 Time taken: 0.301 seconds, Fetched: 1 row(s)
 hive set fs.permissions.umask-mode;
 fs.permissions.umask-mode is undefined
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-20 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-7803:
--

 Summary: Enable Hadoop speculative execution may cause corrupt 
output directory (dynamic partition)
 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 

Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical


One of our users reports they see intermittent failures due to attempt 
directories in the input paths. We found with speculative execution turned on, 
two mappers tried to commit task at the same time using the same committed task 
path,  which cause the corrupt output directory. 

The original Pig script:
(STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
USING org.apache.hcatalog.pig.HCatStorer();)

Two mappers
attempt_1405021984947_5394024_m_000523_0: KILLED
attempt_1405021984947_5394024_m_000523_1: SUCCEEDED

attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.

As a result, it created corrupt directory as 
  
/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
containing 
   part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
and 
   attempt_1405021984947_5394024_m_000523_1/part-m-00523

Namenode Audit log
==
1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
cmd=create 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
 dst=null  perm=user:group:rw-r-

2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
cmd=create 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
 dst=null  perm=user:group:rw-r-

3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
cmd=rename 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
perm=user:group:rwxr-x---

4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
cmd=rename 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
perm=user:group:rwxr-x---

After consulting our Hadoop core team, we was pointed out some HCat code does 
not participating in the two-phase commit protocol, for example in 
FileRecordWriterContainer.close():

for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter
entry : baseDynamicCommitters.entrySet()) {
org.apache.hadoop.mapred.TaskAttemptContext currContext =
dynamicContexts.get(entry.getKey());
OutputCommitter baseOutputCommitter = entry.getValue();
if (baseOutputCommitter.needsTaskCommit(currContext)) {
baseOutputCommitter.commitTask(currContext);
}
}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104375#comment-14104375
 ] 

Sergey Shelukhin commented on HIVE-7689:


-quoted identifiers should be ansi standard... MySQL would require a flag (see 
MetaStoreDirectSql.java)

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104375#comment-14104375
 ] 

Sergey Shelukhin edited comment on HIVE-7689 at 8/20/14 7:14 PM:
-

-quoted identifiers are ansi standard... MySQL would require a flag (see 
MetaStoreDirectSql.java)


was (Author: sershe):
-quoted identifiers should be ansi standard... MySQL would require a flag (see 
MetaStoreDirectSql.java)

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104372#comment-14104372
 ] 

Sergey Shelukhin commented on HIVE-7689:


The problem is that Postgres coerces unquoted identifiers everywhere to lower 
(iirc) case and has no way to disable this, to put it very mildly, questionable 
behavior; afair the request to add a flag similar to mysql one for ANSI was 
also not viewed positively when I tried.
So either everything has to be lower case, or everything has to be quoted (and 
upper case, for simplicity I guess).

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7804) CBO: Support SemiJoins

2014-08-20 Thread Harish Butani (JIRA)
Harish Butani created HIVE-7804:
---

 Summary: CBO: Support SemiJoins
 Key: HIVE-7804
 URL: https://issues.apache.org/jira/browse/HIVE-7804
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7804) CBO: Support SemiJoins

2014-08-20 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7804:


Attachment: HIVE-7804.1.patch

 CBO: Support SemiJoins
 --

 Key: HIVE-7804
 URL: https://issues.apache.org/jira/browse/HIVE-7804
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
 Attachments: HIVE-7804.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4765) Improve HBase bulk loading facility

2014-08-20 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104398#comment-14104398
 ] 

Nick Dimiduk commented on HIVE-4765:


Ping [~navis], [~sushanth].

Any chance we can get some action on this one for 0.14 release? It's definitely 
better than what's available.

 Improve HBase bulk loading facility
 ---

 Key: HIVE-4765
 URL: https://issues.apache.org/jira/browse/HIVE-4765
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4765.2.patch.txt, HIVE-4765.3.patch.txt, 
 HIVE-4765.D11463.1.patch


 With some patches, bulk loading process for HBase could be simplified a lot.
 {noformat}
 CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:key,cf2:value)
 STORED AS
   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
 LOCATION '/tmp/export';
 SET mapred.reduce.tasks=4;
 set hive.optimize.sampling.orderby=true;
 INSERT OVERWRITE TABLE hbase_export
 SELECT * from (SELECT union_kv(key,key,value,:key,cf1:key,cf2:value) as 
 (rowkey,union) FROM src) A ORDER BY rowkey,union;
 hive !hadoop fs -lsr /tmp/export;
   
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf1
 -rw-r--r--   1 navis supergroup   4317 2013-06-20 11:05 
 /tmp/export/cf1/384abe795e1a471cac6d3770ee38e835
 -rw-r--r--   1 navis supergroup   5868 2013-06-20 11:05 
 /tmp/export/cf1/b8b6d746c48f4d12a4cf1a2077a28a2d
 -rw-r--r--   1 navis supergroup   5214 2013-06-20 11:05 
 /tmp/export/cf1/c8be8117a1734bd68a74338dfc4180f8
 -rw-r--r--   1 navis supergroup   4290 2013-06-20 11:05 
 /tmp/export/cf1/ce41f5b1cfdc4722be25207fc59a9f10
 drwxr-xr-x   - navis supergroup  0 2013-06-20 11:05 /tmp/export/cf2
 -rw-r--r--   1 navis supergroup   6744 2013-06-20 11:05 
 /tmp/export/cf2/409673b517d94e16920e445d07710f52
 -rw-r--r--   1 navis supergroup   4975 2013-06-20 11:05 
 /tmp/export/cf2/96af002a6b9f4ebd976ecd83c99c8d7e
 -rw-r--r--   1 navis supergroup   6096 2013-06-20 11:05 
 /tmp/export/cf2/c4f696587c5e42ee9341d476876a3db4
 -rw-r--r--   1 navis supergroup   4890 2013-06-20 11:05 
 /tmp/export/cf2/fd9adc9e982f4fe38c8d62f9a44854ba
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for release of Hive 0.14

2014-08-20 Thread Nick Dimiduk
It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a
big improvement for us HBase folks. Would someone mind having a look in
that direction?

Thanks,
Nick


On Tue, Aug 19, 2014 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote:

 +1
 Sounds good to me.
 Its already almost 4 months since the last release. It is time to
 start preparing for the next one.
 Thanks for volunteering!


 On Tue, Aug 19, 2014 at 2:02 PM, Vikram Dixit vik...@hortonworks.com
 wrote:
  Hi Folks,
 
  I was thinking that it was about time that we had a release of hive 0.14
  given our commitment to having a release of hive on a periodic basis. We
  could cut a branch and start working on a release in say 2 weeks time
  around September 5th (Friday). After branching, we can focus on
 stabilizing
  for the release and hopefully have an RC in about 2 weeks post that. I
  would like to volunteer myself for the duties of the release manager for
  this version if the community agrees.
 
  Thanks
  Vikram.
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]

2014-08-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104403#comment-14104403
 ] 

Hive QA commented on HIVE-7767:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662979/HIVE-7767.2-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5978 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/68/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/68/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-68/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662979

 hive.optimize.union.remove does not work properly [Spark Branch]
 

 Key: HIVE-7767
 URL: https://issues.apache.org/jira/browse/HIVE-7767
 Project: Hive
  Issue Type: Sub-task
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch


 Turing on the hive.optimize.union.remove property generates wrong union all 
 result. 
 For Example:
 {noformat}
 create table inputTbl1(key string, val string) stored as textfile;
 load data local inpath '../../data/files/T1.txt' into table inputTbl1;
 SELECT *
 FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, count(1) as values from inputTbl1 group by key
 ) a;  
 {noformat}
 when the hive.optimize.union.remove is turned on, the query result is like: 
 {noformat}
 1 1
 2 1
 3 1
 7 1
 8 2
 {noformat}
 when the hive.optimize.union.remove is turned off, the query result is like: 
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}
 The expected query result is:
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Mail bounces from ebuddy.com

2014-08-20 Thread Nick Dimiduk
Not quite taken care of. I'm still getting spam about these addresses.


On Mon, Aug 18, 2014 at 9:18 AM, Lars Francke lars.fran...@gmail.com
wrote:

 Thanks Alan and Ashutosh for taking care of this!


 On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Thanks, Alan for the hint. I just unsubscribed those two email addresses
  from ebuddy.
 
 
  On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com
 wrote:
 
   Anyone who is an admin on the list (I don't who the admins are) can do
   this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org
 where
   USERNAME is the name of the bouncing user (see
   http://untroubled.org/ezmlm/ezman/ezman1.html )
  
   Alan.
  
  
  
 Thejas Nair the...@hortonworks.com
August 17, 2014 at 17:02
   I don't know how to do this.
  
   Carl, Ashutosh,
   Do you guys know how to remove these two invalid emails from the
 mailing
   list ?
  
  
 Lars Francke lars.fran...@gmail.com
August 17, 2014 at 15:41
   Hmm great, I see others mentioning this as well. I'm happy to contact
  INFRA
   but I'm not sure if they are even needed or if someone from the Hive
 team
   can do this?
  
  
   On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:43
   (Excuse the spam.) Actually I'm getting two bounces per message, but
  gmail
   concatenates them so I didn't notice the second one.
  
   -- Lefty
  
  
   On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz 
 leftylever...@gmail.com
   leftylever...@gmail.com
  
 Lefty Leverenz leftylever...@gmail.com
August 7, 2014 at 18:36
   Curious, I've only been getting one bounce per message. Anyway thanks
 for
   bringing this up.
  
   -- Lefty
  
  
  
 Lars Francke lars.fran...@gmail.com
August 7, 2014 at 4:38
   Hi,
  
   every time I send a mail to dev@ I get two bounce mails from two
 people
  at
   ebuddy.com. I don't want to post the E-Mail addresses publicly but I
 can
   send them on if needed (and it can be triggered easily by just replying
  to
   this mail I guess).
  
   Could we maybe remove them from the list?
  
   Cheers,
   Lars
  
  
   --
   Sent with Postbox http://www.getpostbox.com
  
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
   to which it is addressed and may contain information that is
  confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



[jira] [Commented] (HIVE-7767) hive.optimize.union.remove does not work properly [Spark Branch]

2014-08-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104421#comment-14104421
 ] 

Brock Noland commented on HIVE-7767:


Hi [~nyang],

Thank you very much for your work on this! The patch looks great!

I did notice that there are a couple of tests where the results differ from 
mapreduce (outside the query plan). I used the following command:

{noformat}
git status | awk '/new file:/ {print $NF}' | xargs -I {} sh -c 'echo {}; diff 
-y -W 150 {} $(echo {} | perl -pe s@/spark@@g)' | less
{noformat}

To compare all files and found that at least of the tests produce different 
results  union_remove_10 and union_remove_22.

Could you take a look?
Thanks!

 hive.optimize.union.remove does not work properly [Spark Branch]
 

 Key: HIVE-7767
 URL: https://issues.apache.org/jira/browse/HIVE-7767
 Project: Hive
  Issue Type: Sub-task
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7767.1-spark.patch, HIVE-7767.2-spark.patch


 Turing on the hive.optimize.union.remove property generates wrong union all 
 result. 
 For Example:
 {noformat}
 create table inputTbl1(key string, val string) stored as textfile;
 load data local inpath '../../data/files/T1.txt' into table inputTbl1;
 SELECT *
 FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, count(1) as values from inputTbl1 group by key
 ) a;  
 {noformat}
 when the hive.optimize.union.remove is turned on, the query result is like: 
 {noformat}
 1 1
 2 1
 3 1
 7 1
 8 2
 {noformat}
 when the hive.optimize.union.remove is turned off, the query result is like: 
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}
 The expected query result is:
 {noformat}
 7 1
 2 1
 8 2
 3 1
 1 1
 7 1
 2 1
 8 2
 3 1
 1 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-20 Thread Andrew Mains (JIRA)
Andrew Mains created HIVE-7805:
--

 Summary: Support running multiple scans in hbase-handler
 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains


Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
This can be less efficient than running multiple disjoint scans in certain 
cases, particularly when using a composite row key. For instance, given a row 
key schema of:

{code}
structbucket int, time timestamp
{code}

if one wants to push down the predicate:

{code}
bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
{code}

it's much more efficient to run a scan for each bucket over the time range 
(particularly if there's a large amount of data per day). With a single scan, 
the MR job has to process the data for all time for buckets in between 1 and 
100.

hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
scans in order to take advantage of this fact.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-20 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Attachment: HIVE-7805.patch

This patch changes HiveHBaseTableInputFormat to extend 
MultiTableInputFormatBase, and allows HBaseKeyFactory implementations to push a 
ListHBaseScanRange, instead of just a single HBaseScanRange.

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
 Attachments: HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-20 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Assignee: Andrew Mains
  Status: Patch Available  (was: Open)

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats

2014-08-20 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7420:
-

Attachment: HIVE-7420-without-HIVE-7457.4.patch
HIVE-7420.4.patch

 Parameterize tests for HCatalog Pig interfaces for testing against all 
 storage formats
 --

 Key: HIVE-7420
 URL: https://issues.apache.org/jira/browse/HIVE-7420
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
 HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420-without-HIVE-7457.4.patch, 
 HIVE-7420.1.patch, HIVE-7420.2.patch, HIVE-7420.3.patch, HIVE-7420.4.patch


 Currently, HCatalog tests only test against RCFile with a few testing against 
 ORC. The tests should be covering other Hive storage formats as well.
 HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
 all Hive storage formats and with that patch, all test suites built on 
 HCatMapReduceTest are running and passing against Sequence File, Text, and 
 ORC in addition to RCFile.
 Similar changes should be made to make the tests for HCatLoader and 
 HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7806) insert overwrite local directory doesn't complain if it can't actually write the data

2014-08-20 Thread Carter Shanklin (JIRA)
Carter Shanklin created HIVE-7806:
-

 Summary: insert overwrite local directory doesn't complain if it 
can't actually write the data
 Key: HIVE-7806
 URL: https://issues.apache.org/jira/browse/HIVE-7806
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Carter Shanklin
Priority: Minor


I tried exporting data to a directory that didn't exist and could not be 
created my my user.

Hive reported success. It would be better if it reported failure here.

{code}
Time taken: 0.397 seconds
hive insert overwrite local directory '/home/hue/staging' row format delimited 
fields terminated by ',' select * from store_sales;
Query ID = hue_20140815141414_e4f0d70e-416e-4268-98ee-e5cc8f16ffaa
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1408132753408_0001, Tracking URL = 
http://sandbox.hortonworks.com:8088/proxy/application_1408132753408_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1408132753408_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-08-15 14:14:47,272 Stage-1 map = 0%,  reduce = 0%
2014-08-15 14:14:55,021 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.98 
sec
MapReduce Total cumulative CPU time: 980 msec
Ended Job = job_1408132753408_0001
Copying data to local directory /home/hue/staging
Copying data to local directory /home/hue/staging
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.98 sec   HDFS Read: 327 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 980 msec
OK
Time taken: 25.903 seconds
{code}

... Meanwhile, in another shell ...

{code}
[hue@sandbox home]$ ls -l /home/hue/staging
ls: cannot access /home/hue/staging: No such file or directory
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23797: HIVE-7457: Minor HCatalog Pig Adapter test clean up.

2014-08-20 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23797/
---

(Updated Aug. 20, 2014, 8:04 p.m.)


Review request for hive.


Changes
---

Address code review comments.


Summary (updated)
-

HIVE-7457: Minor HCatalog Pig Adapter test clean up.


Bugs: HIVE-7420
https://issues.apache.org/jira/browse/HIVE-7420


Repository: hive-git


Description (updated)
---

HIVE-7420: Parameterize tests for HCatalog Pig interfaces for testing against 
all storage formats.


Diffs (updated)
-

  hcatalog/hcatalog-pig-adapter/pom.xml 
4d2ca519d413b7de0a6a8b50f9a099c3539fc432 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/MockLoader.java
 c87b95a00af03d2531eb8bbdda4e307c3aac1fe2 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestE2EScenarios.java
 a4b55c8463b3563f1e602ae2d0809dd318bcfa7f 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 82fc8a9391667138780be8796931793661f61ebb 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoaderComplexSchema.java
 eadbf20afc525dd9f33e9e7fb2a5d5cb89907d7e 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorer.java
 fcfc6428e7db80b8bfe0ce10e37d7b0ee6e58e20 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerMulti.java
 76080f7635548ed9af114c823180d8da9ea8f6c2 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerWrapper.java
 7f0bca763eb07db3822c6d6028357e81278803c9 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoader.java
 82eb0d72b4f885184c094113f775415c06bdce98 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoaderComplexSchema.java
 05387711289279cab743f51aee791069609b904a 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatPigStorer.java
 a9b452101c15fb7a3f0d8d0339f7d0ad97383441 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatStorer.java
 1084092828a9ac5e37f5b50b9c6bbd03f70b48fd 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestPigHCatUtil.java
 a8ce61aaad42b03e4de346530d0724f3d69776b9 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestUtil.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/StorageFormats.java 
19fdeb5ed3dba7a3bcba71fb285d92d3f6aabea9 

Diff: https://reviews.apache.org/r/23797/diff/


Testing
---


Thanks,

David Chen



  1   2   3   >