[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063263#comment-16063263 ] Thejas M Nair commented on HIVE-16938: -- +1 > INFORMATION_SCHEMA usability: difficult to access # of table records > > > Key: HIVE-16938 > URL: https://issues.apache.org/jira/browse/HIVE-16938 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Gunther Hagleitner > Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch > > > HIVE-1010 adds an information schema to Hive, also taking the opportunity to > expose some non-standard but valuable things like statistics in a SYS table. > One common thing users want to know is the number of rows in tables, system > wide. > This information is in the table_params table but the structure of this table > makes it quite inconvenient to access since it is essentially a table of > key-value pairs. More table stats are likely to be added over time, > especially because of ACID. It would be a lot better if this were a first > class table. > For what it's worth I deal with the current table by pivoting it into > something easier to deal with as follows: > {code} > create view table_stats as > select > tbl_id, > max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as > COLUMN_STATS_ACCURATE, > max(case param_key when 'numFiles' then param_value end) as numFiles, > max(case param_key when 'numRows' then param_value end) as numRows, > max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, > max(case param_key when 'totalSize' then param_value end) as totalSize, > max(case param_key when 'transient_lastDdlTime' then param_value end) as > transient_lastDdlTime > from table_params group by tbl_id; > {code} > It would be better to not have users provide workarounds and make table stats > first-class like column stats currently are. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062067#comment-16062067 ] Gunther Hagleitner commented on HIVE-16938: --- Test failures are unrelated. > INFORMATION_SCHEMA usability: difficult to access # of table records > > > Key: HIVE-16938 > URL: https://issues.apache.org/jira/browse/HIVE-16938 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Gunther Hagleitner > Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch > > > HIVE-1010 adds an information schema to Hive, also taking the opportunity to > expose some non-standard but valuable things like statistics in a SYS table. > One common thing users want to know is the number of rows in tables, system > wide. > This information is in the table_params table but the structure of this table > makes it quite inconvenient to access since it is essentially a table of > key-value pairs. More table stats are likely to be added over time, > especially because of ACID. It would be a lot better if this were a first > class table. > For what it's worth I deal with the current table by pivoting it into > something easier to deal with as follows: > {code} > create view table_stats as > select > tbl_id, > max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as > COLUMN_STATS_ACCURATE, > max(case param_key when 'numFiles' then param_value end) as numFiles, > max(case param_key when 'numRows' then param_value end) as numRows, > max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, > max(case param_key when 'totalSize' then param_value end) as totalSize, > max(case param_key when 'transient_lastDdlTime' then param_value end) as > transient_lastDdlTime > from table_params group by tbl_id; > {code} > It would be better to not have users provide workarounds and make table stats > first-class like column stats currently are. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061816#comment-16061816 ] Hive QA commented on HIVE-16938: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12874334/HIVE-16938.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10845 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=146) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=233) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=233) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] (batchId=125) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication (batchId=217) org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=178) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5763/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12874334 - PreCommit-HIVE-Build > INFORMATION_SCHEMA usability: difficult to access # of table records > > > Key: HIVE-16938 > URL: https://issues.apache.org/jira/browse/HIVE-16938 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Gunther Hagleitner > Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch > > > HIVE-1010 adds an information schema to Hive, also taking the opportunity to > expose some non-standard but valuable things like statistics in a SYS table. > One common thing users want to know is the number of rows in tables, system > wide. > This information is in the table_params table but the structure of this table > makes it quite inconvenient to access since it is essentially a table of > key-value pairs. More table stats are likely to be added over time, > especially because of ACID. It would be a lot better if this were a first > class table. > For what it's worth I deal with the current table by pivoting it into > something easier to deal with as follows: > {code} > create view table_stats as > select > tbl_id, > max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as > COLUMN_STATS_ACCURATE, > max(case param_key when 'numFiles' then param_value end) as numFiles, > max(case param_key when 'numRows' then param_value end) as numRows, > max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, > max(case param_key when 'totalSize' then param_value end) as totalSize, > max(case param_key when 'transient_lastDdlTime' then param_value end) as > transient_lastDdlTime > from table_params group by tbl_id; > {code} > It would be better to not have users provide workarounds and make table stats > first-class like column stats currently are. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061627#comment-16061627 ] Gunther Hagleitner commented on HIVE-16938: --- [~cartershanklin] I've followed your lead and created views for table and partition stats. Wondering if we should add another view that's a union of the two and does something like: ", , , ", otherwise you will still have to join with tbls and partitions to get your stats. > INFORMATION_SCHEMA usability: difficult to access # of table records > > > Key: HIVE-16938 > URL: https://issues.apache.org/jira/browse/HIVE-16938 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Gunther Hagleitner > Attachments: HIVE-16938.1.patch > > > HIVE-1010 adds an information schema to Hive, also taking the opportunity to > expose some non-standard but valuable things like statistics in a SYS table. > One common thing users want to know is the number of rows in tables, system > wide. > This information is in the table_params table but the structure of this table > makes it quite inconvenient to access since it is essentially a table of > key-value pairs. More table stats are likely to be added over time, > especially because of ACID. It would be a lot better if this were a first > class table. > For what it's worth I deal with the current table by pivoting it into > something easier to deal with as follows: > {code} > create view table_stats as > select > tbl_id, > max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as > COLUMN_STATS_ACCURATE, > max(case param_key when 'numFiles' then param_value end) as numFiles, > max(case param_key when 'numRows' then param_value end) as numRows, > max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, > max(case param_key when 'totalSize' then param_value end) as totalSize, > max(case param_key when 'transient_lastDdlTime' then param_value end) as > transient_lastDdlTime > from table_params group by tbl_id; > {code} > It would be better to not have users provide workarounds and make table stats > first-class like column stats currently are. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records
[ https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061628#comment-16061628 ] Gunther Hagleitner commented on HIVE-16938: --- [~thejas] can you review? Pretty straight forward addition to simplify stats. > INFORMATION_SCHEMA usability: difficult to access # of table records > > > Key: HIVE-16938 > URL: https://issues.apache.org/jira/browse/HIVE-16938 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Gunther Hagleitner > Attachments: HIVE-16938.1.patch > > > HIVE-1010 adds an information schema to Hive, also taking the opportunity to > expose some non-standard but valuable things like statistics in a SYS table. > One common thing users want to know is the number of rows in tables, system > wide. > This information is in the table_params table but the structure of this table > makes it quite inconvenient to access since it is essentially a table of > key-value pairs. More table stats are likely to be added over time, > especially because of ACID. It would be a lot better if this were a first > class table. > For what it's worth I deal with the current table by pivoting it into > something easier to deal with as follows: > {code} > create view table_stats as > select > tbl_id, > max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as > COLUMN_STATS_ACCURATE, > max(case param_key when 'numFiles' then param_value end) as numFiles, > max(case param_key when 'numRows' then param_value end) as numRows, > max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, > max(case param_key when 'totalSize' then param_value end) as totalSize, > max(case param_key when 'transient_lastDdlTime' then param_value end) as > transient_lastDdlTime > from table_params group by tbl_id; > {code} > It would be better to not have users provide workarounds and make table stats > first-class like column stats currently are. -- This message was sent by Atlassian JIRA (v6.4.14#64029)