[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993805#comment-15993805
 ] 

Sahil Takiar commented on HIVE-15396:
-

I think 3.0 is good for this. Thanks for all the help reviewing this patch!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.0.0, 2.3.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993793#comment-15993793
 ] 

Pengcheng Xiong commented on HIVE-15396:


do you want this in branch-2 or 2.3? In my opinion, this is not a bug, but some 
improvement thus I think 3.0 is enough.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.0.0, 2.3.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993789#comment-15993789
 ] 

Pengcheng Xiong commented on HIVE-15396:


pushed to master. thanks for your hard work!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993784#comment-15993784
 ] 

Pengcheng Xiong commented on HIVE-15396:


good thanks!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993772#comment-15993772
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] test failures are unrelated:

* HIVE-16569 - TestAccumuloCliDriver.testCliDriver[accumulo_index]
* HIVE-15169 - 
TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993736#comment-15993736
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866017/HIVE-15396.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10634 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=155)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5002/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5002/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5002/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866017 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch, HIVE-15396.8.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-02 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993385#comment-15993385
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], the last patch was submitted for QA run almost 1 month ago. could u 
resubmit it for another qa run? thanks.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991473#comment-15991473
 ] 

Pengcheng Xiong commented on HIVE-15396:


+1

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL|
> |

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-05-01 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991263#comment-15991263
 ] 

Pengcheng Xiong commented on HIVE-15396:


LGTM, can you double check my last comment? Thanks.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982368#comment-15982368
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] created an RB: https://reviews.apache.org/r/58691/

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982333#comment-15982333
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], could u create a review request? Thanks.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981840#comment-15981840
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] wanted to see if we can still get this patch in. Let me know what you 
think of the most recent patch. To summarize:

* The patch added basic stats collection for table with a {{LOCATION}} 
specified, but only if the specified location is empty and the table is not an 
external table
* This should be useful when running on blobstores such as S3, where users 
commonly specify an explicit {{LOCATION}} clause

Thanks for spending the time to look at this!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954040#comment-15954040
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] updated patch. All stats are collected if: location is specified, 
location is empty, and table is not external. Does that sound reasonable to you?

The main reason I think is is important is that it will be common for tables 
stored on S3 to specify a {{LOCATION}}. 

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952393#comment-15952393
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12861605/HIVE-15396.7.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10564 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormatPruning
 (batchId=255)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4510/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4510/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4510/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12861605 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952009#comment-15952009
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12861515/HIVE-15396.6.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10544 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[default_file_format] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[external1] 
(batchId=86)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=172)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4502/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4502/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4502/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12861515 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, HIVE-15396.6.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950991#comment-15950991
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12861385/HIVE-15396.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10544 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[default_file_format] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[deleteAnalyze] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_overwrite_directory2]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[deleteAnalyze]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[insert_overwrite_directory2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[external1] 
(batchId=86)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=172)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4490/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4490/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4490/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12861385 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |  

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949511#comment-15949511
 ] 

Sahil Takiar commented on HIVE-15396:
-

Good point. How about the approach in my 3rd patch? It checks if the data 
location is empty or not. If it is empty, all stats are collected, if it isn't 
then only basic stats are added. I'll remove the check for {{isExternal()}}.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949480#comment-15949480
 ] 

Pengcheng Xiong commented on HIVE-15396:


In your 4th patch, you removed the check of location. What if there is data in 
the location? Is it correct to set   basic stats as true and #row=0?

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL  

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949476#comment-15949476
 ] 

Sahil Takiar commented on HIVE-15396:
-

{quote}
In this jira, i assume you would like to set basic stats as true and set #row 
to 0 for CREATING tables with LOCATION specified, right?
{quote}

[~pxiong] yes that is correct.

{quote}
Hive can always collect stats after you run analyze table for any table.
{quote}

Sure, but the point is to auto-gather the stats rather than forcing users to 
run {{ANALYZE TABLE table COMPUTE STATISTICS}}

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949470#comment-15949470
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], I am sorry that i do not get your point. In this jira, i assume you 
would like to set basic stats as true and set #row to 0 for CREATING tables 
with LOCATION specified, right? Hive can always collect stats after you run 
analyze table for any table.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949301#comment-15949301
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] I changed the approach a bit, and I removed the check for 
{{getLocation() == null}} completely. So basic stats will be collected even 
when {{LOCATION}} is specified. I think this is more consistent with other 
stats that are collected. For example, row and column based stats aren't 
restricted to certain table types (e.g. stats from {{ANALYZE TABLE table 
COMPUTE STATISTICS}}.

I also don't think a check to see if the {{LOCATION}} exists or not is 
necessary. Hive will compute basic stats over the existing data when the table 
is created.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948291#comment-15948291
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12861129/HIVE-15396.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10540 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[drop_with_concurrency]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[default_file_format] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[deleteAnalyze] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_overwrite_directory2]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_noscan_2] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[external_table_with_space_in_location_path]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[deleteAnalyze]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[external_table_with_space_in_location_path]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[insert_overwrite_directory2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[root_dir_external_table]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[root_dir_external_table]
 (batchId=84)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[stats_noscan_2] 
(batchId=111)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=172)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4456/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4456/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4456/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12861129 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947016#comment-15947016
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12860988/HIVE-15396.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10518 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[drop_with_concurrency]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[default_file_format] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[deleteAnalyze] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic1] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=73)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_storage_queries]
 (batchId=92)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[deleteAnalyze]
 (batchId=145)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=172)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4438/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4438/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4438/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12860988 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:  

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946219#comment-15946219
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12860930/HIVE-15396.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 817 failed/errored test(s), 6297 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key2]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_joins] 
(batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=222)
org.apache.hadoop.hive.cli.TestBeeLineDriver.org.apache.hadoop.hive.cli.TestBeeLineDriver
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_hdfs_to_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_local]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_warehouse]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_local_to_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore_nonpart]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_local]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse_nonpart]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_local_to_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=27)

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-23 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939454#comment-15939454
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] can't we take the location, create a {{FileSystem}} object, and the 
run {{fs.exists()}} - if the location exists, then don't setup stats, if it 
doesn't exist then setup full stats.

There is no guarantee that other process don't write data into the the 
location, but then again there is no guarantee that other processes don't write 
into {{hive.metastore.warehouse.dir}}

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns:

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-23 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939414#comment-15939414
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], the problem is then how do u guarantee that the location is empty? 
How about the other File systems?

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL|
> |   | 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-23 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939402#comment-15939402
 ] 

Sahil Takiar commented on HIVE-15396:
-

Thanks [~pxiong] for taking a look! I notice this behavior even when the 
specified location is empty. What if I updated the patch so all stats are 
collected only if the target location is empty? The use case is when running 
Hive-on-S3. It's common practice to create managed Hive tables with a specified 
location - e.g. {{CREATE TABLE s3_table (col int) LOCATION 
's3a://[bucket-name]/s3_table/'}}

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-23 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938944#comment-15938944
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], thanks for your patch. However, this is by design. If the location 
of the table is specified, we should not trust the basic stats unless the 
location is empty.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937901#comment-15937901
 ] 

Hive QA commented on HIVE-15396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12860057/HIVE-15396.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10510 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=234)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[default_file_format] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[deleteAnalyze] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic1] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=72)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_storage_queries]
 (batchId=92)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[deleteAnalyze]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=94)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4308/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4308/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4308/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12860057 - PreCommit-HIVE-Build

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters:   

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-03-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937547#comment-15937547
 ] 

Sahil Takiar commented on HIVE-15396:
-

Attached patch is pretty simple. It changes some logic in {{CreateTableDesc}} 
where stats entries for a table are initialized to 0. Originally, the logic 
would only initialize basic stats for managed tables with a {{LOCATION}} 
specified. This patch changes that logic so that all stats are collected for 
managed tables, and now basic stats are only called for {{EXTERNAL}} tables (I 
believe that may have been the original intention?). Only stats are properly 
initialized for a table, their collection proceeds successfully.

I'm guessing there will be some qtest failures, so I'll fix that before adding 
any additional tests.

[~pxiong] I believe this patch modifies some of the code from HIVE-13341 - 
could you take a look at this patch and let me know what you think?

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>