[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063263#comment-16063263
 ] 

Thejas M Nair commented on HIVE-16938:
--

+1

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-24 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062067#comment-16062067
 ] 

Gunther Hagleitner commented on HIVE-16938:
---

Test failures are unrelated.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061816#comment-16061816
 ] 

Hive QA commented on HIVE-16938:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874334/HIVE-16938.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5763/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874334 - PreCommit-HIVE-Build

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061627#comment-16061627
 ] 

Gunther Hagleitner commented on HIVE-16938:
---

[~cartershanklin] I've followed your lead and created views for table and 
partition stats. Wondering if we should add another view that's a union of the 
two and does something like: ", , , ", otherwise you will still have to join with 
tbls and partitions to get your stats.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061628#comment-16061628
 ] 

Gunther Hagleitner commented on HIVE-16938:
---

[~thejas] can you review? Pretty straight forward addition to simplify stats.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)