[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061816#comment-16061816
 ] 

Hive QA commented on HIVE-16938:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874334/HIVE-16938.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5763/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5763/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874334 - PreCommit-HIVE-Build

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061801#comment-16061801
 ] 

Hive QA commented on HIVE-16177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874345/HIVE-16177.16.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10849 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_buckets] 
(batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[zero_rows_blobstore]
 (batchId=241)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5762/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5762/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5762/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874345 - PreCommit-HIVE-Build

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> 

[jira] [Updated] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16938:
--
Status: Patch Available  (was: Open)

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16937:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Fix For: 3.0.0
>
> Attachments: HIVE-16937.1.patch, HIVE-16937.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061794#comment-16061794
 ] 

Gunther Hagleitner commented on HIVE-16937:
---

Test failures are unrelated. Ran them locally.

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch, HIVE-16937.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061784#comment-16061784
 ] 

Hive QA commented on HIVE-16954:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874344/HIVE-16954-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10600 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=144)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5761/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5761/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5761/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874344 - PreCommit-HIVE-Build

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061766#comment-16061766
 ] 

Hive QA commented on HIVE-16937:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874330/HIVE-16937.2.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation2 (batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5760/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5760/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5760/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874330 - PreCommit-HIVE-Build

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch, HIVE-16937.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most 

[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-23 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061759#comment-16061759
 ] 

Nemon Lou commented on HIVE-16839:
--

Our system does not support concurrency. 
When users submit both drop partition and modify the same partition 
concurrently by accident,then got uncommitted transaction.
For postgresql as backend,there will be a connection in state of idle in 
transaction.

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> 

[jira] [Comment Edited] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061752#comment-16061752
 ] 

Sergey Shelukhin edited comment on HIVE-16954 at 6/24/17 2:31 AM:
--

Ran a few queries and I don't see IO time increasing. Will test further next 
week.


was (Author: sershe):
Ran a few queries and I don't see IO time slowdown. Will test further next week.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061753#comment-16061753
 ] 

Sergey Shelukhin commented on HIVE-16954:
-

[~prasanth_j] [~gopalv] do you want to review some time? :) 
I made patch for branch-2 for now, will forward-port to master.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061752#comment-16061752
 ] 

Sergey Shelukhin commented on HIVE-16954:
-

Ran a few queries and I don't see IO time slowdown. Will test further next week.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-06-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.16.patch

patch 16 addresses RB comments

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061742#comment-16061742
 ] 

Hive QA commented on HIVE-16926:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874326/HIVE-16926.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 190 failed/errored test(s), 9479 tests 
executed
*Failed tests:*
{noformat}
TestAcidTableSerializer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestAvroHCatLoader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestAvroHCatStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestBeeLineExceptionHandling - did not produce a TEST-*.xml file (likely timed 
out) (batchId=176)
TestBeeLineHistory - did not produce a TEST-*.xml file (likely timed out) 
(batchId=176)
TestBeelineArgParsing - did not produce a TEST-*.xml file (likely timed out) 
(batchId=176)
TestBucketIdResolverImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestCLIAuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=227)
TestCliDriverMethods - did not produce a TEST-*.xml file (likely timed out) 
(batchId=175)
TestClientCommandHookFactory - did not produce a TEST-*.xml file (likely timed 
out) (batchId=176)
TestCommands - did not produce a TEST-*.xml file (likely timed out) 
(batchId=178)
TestCookieSigner - did not produce a TEST-*.xml file (likely timed out) 
(batchId=194)
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=231)
TestDefaultHCatRecord - did not produce a TEST-*.xml file (likely timed out) 
(batchId=189)
TestDelimitedInputWriter - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestE2EScenarios - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestEximReplicationTasks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=178)
TestGroupingValidator - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHCatClient - did not produce a TEST-*.xml file (likely timed out) 
(batchId=178)
TestHCatClientNotification - did not produce a TEST-*.xml file (likely timed 
out) (batchId=231)
TestHCatDynamicPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=185)
TestHCatExternalDynamicPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=187)
TestHCatExternalNonPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=188)
TestHCatExternalPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=184)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=231)
TestHCatHiveThriftCompatibility - did not produce a TEST-*.xml file (likely 
timed out) (batchId=231)
TestHCatInputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=188)
TestHCatInputFormatMethods - did not produce a TEST-*.xml file (likely timed 
out) (batchId=188)
TestHCatLoaderComplexSchema - did not produce a TEST-*.xml file (likely timed 
out) (batchId=181)
TestHCatLoaderEncryption - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestHCatLoaderStorer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestHCatMultiOutputFormat - did not produce a TEST-*.xml file (likely timed 
out) (batchId=188)
TestHCatMutableDynamicPartitioned - did not produce a TEST-*.xml file (likely 
timed out) (batchId=182)
TestHCatMutableNonPartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=188)
TestHCatMutablePartitioned - did not produce a TEST-*.xml file (likely timed 
out) (batchId=186)
TestHCatNonPartitioned - did not produce a TEST-*.xml file (likely timed out) 
(batchId=183)
TestHCatOutputFormat - did not produce a TEST-*.xml file (likely timed out) 
(batchId=188)
TestHCatPartitionPublish - did not produce a TEST-*.xml file (likely timed out) 
(batchId=183)
TestHCatPartitioned - did not produce a TEST-*.xml file (likely timed out) 
(batchId=183)
TestHCatSchema - did not produce a TEST-*.xml file (likely timed out) 
(batchId=189)
TestHCatSchemaUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=189)
TestHCatStorerMulti - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestHCatStorerWrapper - did not produce a TEST-*.xml file (likely timed out) 
(batchId=181)
TestHeartbeatTimerTask - did not produce a TEST-*.xml file (likely timed out) 
(batchId=190)
TestHiveCli - did not produce a TEST-*.xml file (likely timed out) (batchId=176)
TestHiveClientCache - did not produce a TEST-*.xml file (likely timed out) 
(batchId=188)
TestHiveSchemaTool - did not produce a TEST-*.xml file (likely timed out) 
(batchId=176)
TestInputJobInfo - did not produce a TEST-*.xml file (likely 

[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061729#comment-16061729
 ] 

Ashutosh Chauhan commented on HIVE-16934:
-

+1

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.03.patch, HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16954:

Status: Patch Available  (was: Open)

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16954:

Attachment: HIVE-16954-branch-2.patch

The patch. I've run a few queries with it. Need to perf test the normal case to 
see if there's too much overhead.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16954) LLAP IO: better debugging

2017-06-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16954:
---


> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16942) INFORMATION_SCHEMA: schematool for setting it up is not idempotent

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061684#comment-16061684
 ] 

Gunther Hagleitner commented on HIVE-16942:
---

Patch fixes the problem described, but doesn't make the script idempotent. I'm 
not sure that's desirable because a non-empty sys db could indicate a problem 
(existing db). It's fine to drop an empty db though.

[~thejas] review please?

> INFORMATION_SCHEMA: schematool for setting it up is not idempotent
> --
>
> Key: HIVE-16942
> URL: https://issues.apache.org/jira/browse/HIVE-16942
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16942.1.patch
>
>
> If you run schematool to set up information schema, but the SYS database 
> already exists, here's what happens:
> {code}
> [vagrant@trunk apache-hive-3.0.0-SNAPSHOT-bin]$ schematool -metaDbType mysql 
> -dbType hive -initSchema -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> Metastore connection URL:  jdbc:hive2://localhost:1/default
> Metastore Connection Driver :  org.apache.hive.jdbc.HiveDriver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.0.0
> Initialization script hive-schema-3.0.0.hive.sql
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Database SYS already exists
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
> {code}
> Why is this a problem you ask?
> If you run schematool without hive.metastore.db.type set (or set to the wrong 
> thing), it will create the sys database but fail to create any of the tables 
> within it. If you go and fix hive.metastore.db.type and re-run you'll get 
> this failure until you drop the SYS database (which must be done as the hive 
> user).
> Can the init script use "create database if not exists sys" rather than just 
> "create database sys"?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16942) INFORMATION_SCHEMA: schematool for setting it up is not idempotent

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16942:
--
Attachment: HIVE-16942.1.patch

> INFORMATION_SCHEMA: schematool for setting it up is not idempotent
> --
>
> Key: HIVE-16942
> URL: https://issues.apache.org/jira/browse/HIVE-16942
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16942.1.patch
>
>
> If you run schematool to set up information schema, but the SYS database 
> already exists, here's what happens:
> {code}
> [vagrant@trunk apache-hive-3.0.0-SNAPSHOT-bin]$ schematool -metaDbType mysql 
> -dbType hive -initSchema -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> Metastore connection URL:  jdbc:hive2://localhost:1/default
> Metastore Connection Driver :  org.apache.hive.jdbc.HiveDriver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.0.0
> Initialization script hive-schema-3.0.0.hive.sql
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Database SYS already exists
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
> {code}
> Why is this a problem you ask?
> If you run schematool without hive.metastore.db.type set (or set to the wrong 
> thing), it will create the sys database but fail to create any of the tables 
> within it. If you go and fix hive.metastore.db.type and re-run you'll get 
> this failure until you drop the SYS database (which must be done as the hive 
> user).
> Can the init script use "create database if not exists sys" rather than just 
> "create database sys"?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-23 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061678#comment-16061678
 ] 

Vihang Karajgaonkar commented on HIVE-16839:


Hi [~nemon] What is the value of {{hive.support.concurrency}} on your system? I 
think this issue is more related to concurrency and not related to unbalanced 
calls to open and commit transaction. When the concurrency is turned off both 
the session will be free to proceed without acquiring any ZK locks and hence 
the exception on one of the sessions. The trace shows that it is trying to get 
the StorageDescriptor of the partition while the other session has already 
dropped the partition. When you turn on the concurrency, the drop partition 
session will wait since until it acquires the lock and then proceed as expected.

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   

[jira] [Updated] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16938:
--
Attachment: HIVE-16938.2.patch

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061675#comment-16061675
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874324/HIVE-16832.11.patch

{color:green}SUCCESS:{color} +1 due to 17 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 10857 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_explode2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_noalias] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_stack] (batchId=36)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=98)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true] 
(batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[lateral_view_explode2]
 (batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5758/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5758/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 31 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874324 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16942) INFORMATION_SCHEMA: schematool for setting it up is not idempotent

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-16942:
-

Assignee: Gunther Hagleitner

> INFORMATION_SCHEMA: schematool for setting it up is not idempotent
> --
>
> Key: HIVE-16942
> URL: https://issues.apache.org/jira/browse/HIVE-16942
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
>
> If you run schematool to set up information schema, but the SYS database 
> already exists, here's what happens:
> {code}
> [vagrant@trunk apache-hive-3.0.0-SNAPSHOT-bin]$ schematool -metaDbType mysql 
> -dbType hive -initSchema -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> Metastore connection URL:  jdbc:hive2://localhost:1/default
> Metastore Connection Driver :  org.apache.hive.jdbc.HiveDriver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.0.0
> Initialization script hive-schema-3.0.0.hive.sql
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Database SYS already exists
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
> {code}
> Why is this a problem you ask?
> If you run schematool without hive.metastore.db.type set (or set to the wrong 
> thing), it will create the sys database but fail to create any of the tables 
> within it. If you go and fix hive.metastore.db.type and re-run you'll get 
> this failure until you drop the SYS database (which must be done as the hive 
> user).
> Can the init script use "create database if not exists sys" rather than just 
> "create database sys"?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16937:
--
Attachment: HIVE-16937.2.patch

Forgot to update jdbc_handler.q - patch .2 has that.

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch, HIVE-16937.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061634#comment-16061634
 ] 

Eugene Koifman commented on HIVE-14988:
---

I left so comments on RB.
I think this needs more tests.  For example, does Compaction do the right thing?
Are expected files created on disk?  

TestDbTxnManager2 has examples of how to simulate concurrency w/o having 
multiple threads by using start transaction/commit

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch, HIVE-14988.04.patch, HIVE-14988.05.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061629#comment-16061629
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874324/HIVE-16832.11.patch

{color:green}SUCCESS:{color} +1 due to 17 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 10857 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_explode2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_noalias] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_stack] (batchId=36)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=98)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true] 
(batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[lateral_view_explode2]
 (batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMultiInsert (batchId=269)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5757/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5757/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5757/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 33 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874324 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, 

[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061627#comment-16061627
 ] 

Gunther Hagleitner commented on HIVE-16938:
---

[~cartershanklin] I've followed your lead and created views for table and 
partition stats. Wondering if we should add another view that's a union of the 
two and does something like: ", , , ", otherwise you will still have to join with 
tbls and partitions to get your stats.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061628#comment-16061628
 ] 

Gunther Hagleitner commented on HIVE-16938:
---

[~thejas] can you review? Pretty straight forward addition to simplify stats.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16938:
--
Attachment: HIVE-16938.1.patch

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Status: Patch Available  (was: Open)

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.2.patch

- Umbilical token should be same for all fragments of the same request
- Minor restructuring of retried requests
- Minor renaming


> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6133) Support partial partition exchange

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061590#comment-16061590
 ] 

Hive QA commented on HIVE-6133:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638030/HIVE-6133.1.patch.txt

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5756/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5756/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5756/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-23 22:24:21.227
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5756/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-23 22:24:21.229
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at bc510f6 HIVE-16751: Support different types for grouping columns 
in GroupBy Druid queries (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at bc510f6 HIVE-16751: Support different types for grouping columns 
in GroupBy Druid queries (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-23 22:24:22.426
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java:120
error: metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java: 
patch does not apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java:1431
error: metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java: 
patch does not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:641
error: ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12638030 - PreCommit-HIVE-Build

> Support partial partition exchange
> --
>
> Key: HIVE-6133
> URL: https://issues.apache.org/jira/browse/HIVE-6133
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-6133.1.patch.txt
>
>
> Current alter exchange coerces source and destination table to have same 
> partition columns. But source table has sub-set of partitions and provided 
> partition spec supplements to be a complete partition spec, it need not to be 
> that.
> For example, table into partition 
> {noformat}
> CREATE TABLE exchange_part_test1 (f1 string) PARTITIONED BY (ds STRING);
> CREATE TABLE exchange_part_test2 (f1 string) 
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2;
> {noformat}
> or 
> partial partitions into parent partition
> {noformat}
> CREATE TABLE exchange_part_test1 (f1 string) PARTITIONED BY (ds STRING, hr 
> STRING);
> CREATE TABLE exchange_part_test2 (f1 string) PARTITIONED BY (hr STRING)
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2;
> {noformat}
> can be possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061587#comment-16061587
 ] 

Hive QA commented on HIVE-16935:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874309/HIVE-16935.1.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10847 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5755/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5755/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5755/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874309 - PreCommit-HIVE-Build

> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.11.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061516#comment-16061516
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874305/HIVE-13567.18.patch

{color:green}SUCCESS:{color} +1 due to 19 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 765 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=241)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join14] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join17] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19_inclause] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join1] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join2] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join3] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join4] (batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join5] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join6] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join7] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join8] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join9] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] 
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket1] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket2] (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket3] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join17] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby2_map_multi_distinct]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_groupby3_noskew_multi_distinct]
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_pruner_multiple_children]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnarserde_create_shortcut]
 

[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.10.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6133) Support partial partition exchange

2017-06-23 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061472#comment-16061472
 ] 

Naveen Gangam commented on HIVE-6133:
-

[~navis] Should we revisit this improvement? From this feature documentation 
page at https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition,
{{When the command is executed, the source table's partition folder in HDFS 
will be renamed to move it to the destination table's partition folder.}}

When exchanging a partition into a already partitioned table, what will the 
order of the partition keys be for the destination table? What will happen to 
existing partitions on the source table that only had single partition key when 
it was created? Thanks

> Support partial partition exchange
> --
>
> Key: HIVE-6133
> URL: https://issues.apache.org/jira/browse/HIVE-6133
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-6133.1.patch.txt
>
>
> Current alter exchange coerces source and destination table to have same 
> partition columns. But source table has sub-set of partitions and provided 
> partition spec supplements to be a complete partition spec, it need not to be 
> that.
> For example, table into partition 
> {noformat}
> CREATE TABLE exchange_part_test1 (f1 string) PARTITIONED BY (ds STRING);
> CREATE TABLE exchange_part_test2 (f1 string) 
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2;
> {noformat}
> or 
> partial partitions into parent partition
> {noformat}
> CREATE TABLE exchange_part_test1 (f1 string) PARTITIONED BY (ds STRING, hr 
> STRING);
> CREATE TABLE exchange_part_test2 (f1 string) PARTITIONED BY (hr STRING)
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2;
> {noformat}
> can be possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16951) ACID: Compactor doesn't close JobClient

2017-06-23 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta reassigned HIVE-16951:
---

Assignee: Vaibhav Gumashta

> ACID: Compactor doesn't close JobClient
> ---
>
> Key: HIVE-16951
> URL: https://issues.apache.org/jira/browse/HIVE-16951
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> When a compaction job is launched, we create a new JobClient everytime we run 
> the MR job:
> {code}
>   private void launchCompactionJob(JobConf job, Path baseDir, CompactionType 
> compactionType,
>StringableList dirsToSearch,
>List parsedDeltas,
>int curDirNumber, int obsoleteDirNumber, 
> HiveConf hiveConf,
>TxnStore txnHandler, long id, String 
> jobName) throws IOException {
> job.setBoolean(IS_MAJOR, compactionType == CompactionType.MAJOR);
> if(dirsToSearch == null) {
>   dirsToSearch = new StringableList();
> }
> StringableList deltaDirs = new StringableList();
> long minTxn = Long.MAX_VALUE;
> long maxTxn = Long.MIN_VALUE;
> for (AcidUtils.ParsedDelta delta : parsedDeltas) {
>   LOG.debug("Adding delta " + delta.getPath() + " to directories to 
> search");
>   dirsToSearch.add(delta.getPath());
>   deltaDirs.add(delta.getPath());
>   minTxn = Math.min(minTxn, delta.getMinTransaction());
>   maxTxn = Math.max(maxTxn, delta.getMaxTransaction());
> }
> if (baseDir != null) job.set(BASE_DIR, baseDir.toString());
> job.set(DELTA_DIRS, deltaDirs.toString());
> job.set(DIRS_TO_SEARCH, dirsToSearch.toString());
> job.setLong(MIN_TXN, minTxn);
> job.setLong(MAX_TXN, maxTxn);
> if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
>   mrJob = job;
> }
> LOG.info("Submitting " + compactionType + " compaction job '" +
>   job.getJobName() + "' to " + job.getQueueName() + " queue.  " +
>   "(current delta dirs count=" + curDirNumber +
>   ", obsolete delta dirs count=" + obsoleteDirNumber + ". TxnIdRange[" + 
> minTxn + "," + maxTxn + "]");
> RunningJob rj = new JobClient(job).submitJob(job);
> LOG.info("Submitted compaction job '" + job.getJobName() + "' with 
> jobID=" + rj.getID() + " compaction ID=" + id);
> txnHandler.setHadoopJobId(rj.getID().toString(), id);
> rj.waitForCompletion();
> if (!rj.isSuccessful()) {
>   throw new IOException(compactionType == CompactionType.MAJOR ? "Major" 
> : "Minor" +
>   " compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID() );
> }
>   }
> {code}
> We should close the JobClient to release resources (cached FS objects etc).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061462#comment-16061462
 ] 

Hive QA commented on HIVE-16937:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874300/HIVE-16937.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[jdbc_handler]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5753/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5753/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5753/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874300 - PreCommit-HIVE-Build

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really 

[jira] [Updated] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-23 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16935:
--
Status: Patch Available  (was: Open)

> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-23 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061452#comment-16061452
 ] 

Andrew Sherman commented on HIVE-16935:
---

h2. Description of Changes

We strip sql comments from a command string. The stripped command is use to 
determine which
CommandProcessor will execute the command. If the CommandProcessorFactory does 
not select a special
CommandProcessor then we execute the original unstripped command so that the 
sql parser can remove comments.
Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine to 
call it from there
Add a better test with separate tokens for "set role" in 
TestCommandProcessorFactory.
Add a test case for comment removal in set_processor_namespaces.q  using an 
indented comment as
unindented comments are removed by the test driver.



> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-23 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16935:
--
Attachment: HIVE-16935.1.patch

> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-23 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061436#comment-16061436
 ] 

Chao Sun commented on HIVE-11297:
-

Thanks [~kellyzly]! +1 on the latest patch.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, HIVE-11297.8.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061409#comment-16061409
 ] 

Hive QA commented on HIVE-16934:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874292/HIVE-16934.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5752/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5752/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5752/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874292 - PreCommit-HIVE-Build

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.03.patch, HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061405#comment-16061405
 ] 

Jason Dere commented on HIVE-16937:
---

+1 pending test results

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-23 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.18.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-16938:
-

Assignee: Gunther Hagleitner

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061376#comment-16061376
 ] 

Gunther Hagleitner commented on HIVE-16937:
---

Interesting. Turns out that the JDBC handler was happily ignoring all types, 
returning string for everything. [~jdere] you looked at the handler before, 
could you review the fix? cc [~thejas]. With the patch you get this:

{noformat}POSTHOOK: query: describe sys.tab_col_stats   

  
POSTHOOK: type: DESCTABLE   


POSTHOOK: Input: sys@tab_col_stats  


cs_id bigintfrom deserializer   


db_name   stringfrom deserializer   


table_namestringfrom deserializer   


column_name   stringfrom deserializer   


column_type   stringfrom deserializer   


tbl_idbigintfrom deserializer   


long_low_valuebigintfrom deserializer   


long_high_value   bigintfrom deserializer   


double_high_value doublefrom deserializer   


double_low_value  doublefrom deserializer   


big_decimal_low_value stringfrom deserializer   


big_decimal_high_value  stringfrom deserializer 


num_nulls bigintfrom deserializer   


num_distincts bigintfrom deserializer   


avg_col_len   doublefrom deserializer   


max_col_len   bigintfrom deserializer   


num_trues bigintfrom deserializer   


num_falsesbigintfrom deserializer   


last_analyzed bigintfrom deserializer
{noformat}

> INFORMATION_SCHEMA usability: everything is currently a string
> 

[jira] [Updated] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16937:
--
Status: Patch Available  (was: Open)

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-16937:
-

Assignee: Gunther Hagleitner

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16937:
--
Attachment: HIVE-16937.1.patch

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16949:
---

Assignee: Sahil Takiar

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061366#comment-16061366
 ] 

Sahil Takiar commented on HIVE-16949:
-

Thanks for reporting this [~birger], I will post a fix soon.

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16890) org.apache.hadoop.hive.serde2.io.HiveVarcharWritable - Adds Superfluous Wrapper

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061359#comment-16061359
 ] 

Hive QA commented on HIVE-16890:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872885/HIVE-16890.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=225)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5751/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5751/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5751/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872885 - PreCommit-HIVE-Build

> org.apache.hadoop.hive.serde2.io.HiveVarcharWritable - Adds Superfluous 
> Wrapper
> ---
>
> Key: HIVE-16890
> URL: https://issues.apache.org/jira/browse/HIVE-16890
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16890.1.patch
>
>
> Class {{org.apache.hadoop.hive.serde2.io.HiveVarcharWritable}} creates a 
> superfluous wrapper and then immediately unwraps it.  Don't bother wrapping 
> in this scenario.
> {code}
>   public void set(HiveVarchar val, int len) {
> set(val.getValue(), len);
>   }
>   public void set(String val, int maxLength) {
> value.set(HiveBaseChar.enforceMaxLength(val, maxLength));
>   }
>   public HiveVarchar getHiveVarchar() {
> return new HiveVarchar(value.toString(), -1);
>   }
>   // Here calls getHiveVarchar() which creates a new HiveVarchar object with 
> a string in it
>   // The object is passed to set(HiveVarchar val, int len)
>   //  The string is pulled out
>   public void enforceMaxLength(int maxLength) {
> // Might be possible to truncate the existing Text value, for now just do 
> something simple.
> if (value.getLength()>maxLength && getCharacterLength()>maxLength)
>   set(getHiveVarchar(), maxLength);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061317#comment-16061317
 ] 

Vihang Karajgaonkar commented on HIVE-16949:


[~stakiar]

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16934:
---
Attachment: HIVE-16934.03.patch

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.03.patch, HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061297#comment-16061297
 ] 

Jesus Camacho Rodriguez commented on HIVE-16934:


[~ashutoshc], regenerated last q file changes and created RB in: 
https://reviews.apache.org/r/60395/

I read your mind :)

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061294#comment-16061294
 ] 

Ashutosh Chauhan commented on HIVE-16934:
-

Can you create a RB for this? Wanna take a closer look at some of plan changes.

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16856) Allow For Customization Of Buffer Size In MapJoinTableContainerSerDe

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061283#comment-16061283
 ] 

Hive QA commented on HIVE-16856:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871986/HIVE-16856.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5750/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5750/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5750/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871986 - PreCommit-HIVE-Build

> Allow For Customization Of Buffer Size In MapJoinTableContainerSerDe
> 
>
> Key: HIVE-16856
> URL: https://issues.apache.org/jira/browse/HIVE-16856
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16856.1.patch
>
>
> MapJoinTableContainerSerDe currently hard-codes the buffer sizes as 4K.  If 
> we remove the explicit buffer size, it will fall back to using the customized 
> buffer size (or 4K):
> {code}
>public FSDataInputStream open(Path f) throws IOException {
>  return open(f, getConf().getInt("io.file.buffer.size", 4096));
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061281#comment-16061281
 ] 

Jason Dere commented on HIVE-16947:
---

Can you add a test for this?

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16947.1.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061205#comment-16061205
 ] 

Hive QA commented on HIVE-16934:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874270/HIVE-16934.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=241)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reduce_deduplicate_extended2]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_in_having] 
(batchId=55)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_views]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
 (batchId=151)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query83] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=127)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5749/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5749/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5749/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874270 - PreCommit-HIVE-Build

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16856) Allow For Customization Of Buffer Size In MapJoinTableContainerSerDe

2017-06-23 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16856:
---
Status: Patch Available  (was: Open)

> Allow For Customization Of Buffer Size In MapJoinTableContainerSerDe
> 
>
> Key: HIVE-16856
> URL: https://issues.apache.org/jira/browse/HIVE-16856
> Project: Hive
>  Issue Type: Improvement
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16856.1.patch
>
>
> MapJoinTableContainerSerDe currently hard-codes the buffer sizes as 4K.  If 
> we remove the explicit buffer size, it will fall back to using the customized 
> buffer size (or 4K):
> {code}
>public FSDataInputStream open(Path f) throws IOException {
>  return open(f, getConf().getInt("io.file.buffer.size", 4096));
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16890) org.apache.hadoop.hive.serde2.io.HiveVarcharWritable - Adds Superfluous Wrapper

2017-06-23 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16890:
---
Status: Patch Available  (was: Open)

> org.apache.hadoop.hive.serde2.io.HiveVarcharWritable - Adds Superfluous 
> Wrapper
> ---
>
> Key: HIVE-16890
> URL: https://issues.apache.org/jira/browse/HIVE-16890
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16890.1.patch
>
>
> Class {{org.apache.hadoop.hive.serde2.io.HiveVarcharWritable}} creates a 
> superfluous wrapper and then immediately unwraps it.  Don't bother wrapping 
> in this scenario.
> {code}
>   public void set(HiveVarchar val, int len) {
> set(val.getValue(), len);
>   }
>   public void set(String val, int maxLength) {
> value.set(HiveBaseChar.enforceMaxLength(val, maxLength));
>   }
>   public HiveVarchar getHiveVarchar() {
> return new HiveVarchar(value.toString(), -1);
>   }
>   // Here calls getHiveVarchar() which creates a new HiveVarchar object with 
> a string in it
>   // The object is passed to set(HiveVarchar val, int len)
>   //  The string is pulled out
>   public void enforceMaxLength(int maxLength) {
> // Might be possible to truncate the existing Text value, for now just do 
> something simple.
> if (value.getLength()>maxLength && getCharacterLength()>maxLength)
>   set(getHiveVarchar(), maxLength);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061145#comment-16061145
 ] 

Jesus Camacho Rodriguez commented on HIVE-16888:


[~rusanu], I have pushed HIVE-16751 to master.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16751:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16934) Transform COUNT(x) into COUNT() when x is not nullable

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16934:
---
Attachment: HIVE-16934.02.patch

> Transform COUNT(x) into COUNT() when x is not nullable
> --
>
> Key: HIVE-16934
> URL: https://issues.apache.org/jira/browse/HIVE-16934
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16934.01.patch, HIVE-16934.02.patch, 
> HIVE-16934.patch
>
>
> Add a rule to simplify COUNT aggregation function if possible, removing 
> expressions that cannot be nullable from its parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061089#comment-16061089
 ] 

Ashutosh Chauhan commented on HIVE-16751:
-

+1

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061062#comment-16061062
 ] 

Jesus Camacho Rodriguez commented on HIVE-16751:


[~ashutoshc], could you review it? Thanks

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061060#comment-16061060
 ] 

Jesus Camacho Rodriguez commented on HIVE-16888:


Btw, {{tez_smb_main}} seems to be failing for the last x runs, so maybe it is 
not related to this patch.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061059#comment-16061059
 ] 

Hive QA commented on HIVE-16751:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12869703/HIVE-16751.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.druid.TestDruidSerDe.testDruidDeserializer (batchId=246)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5748/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5748/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5748/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12869703 - PreCommit-HIVE-Build

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061042#comment-16061042
 ] 

Jesus Camacho Rodriguez commented on HIVE-16888:


No worries, I will push it shortly, waiting for QA.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061039#comment-16061039
 ] 

Remus Rusanu commented on HIVE-16888:
-

So should I include HIVE-16751? Or it will be committed separately?

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061035#comment-16061035
 ] 

Jesus Camacho Rodriguez commented on HIVE-16888:


Great! Sorry I did not mention it, I completely forgot about that one.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060948#comment-16060948
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, not sure 
if from HIVE-16751 or from HIVE-16888:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060948#comment-16060948
 ] 

Remus Rusanu edited comment on HIVE-16888 at 6/23/17 2:13 PM:
--

[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, similar 
to other HIVE-16888 druid changes:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}


was (Author: rusanu):
[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, not sure 
if from HIVE-16751 or from HIVE-16888:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16751:
---
Status: Patch Available  (was: Open)

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060907#comment-16060907
 ] 

Jesus Camacho Rodriguez commented on HIVE-16888:


Some changes look fine, e.g., more computation pushed to Druid.

About {{druid_basic2}}, could you let me know if the patch in HIVE-16751 fixes 
the issue? It might be related to that.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-23 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060835#comment-16060835
 ] 

Rui Li commented on HIVE-16840:
---

Hi [~xuefuz], I doubt Spark can take much advantage of it because the reducer 
fetches all the data anyway. But I agree limit with large number is rare so 
it's OK to leave it aside at the moment.
Another thing I'm not sure is, hive should have already pushed down the limit 
to the upstream of shuffle. Looking at the RS code, it uses a TopN hash to 
track the top N keys in input. Ideally, each RS will only output N records. I 
tried some simple query to verify how this saves shuffled data.
[~kellyzly], do you know why it's not working as expected in your case?

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
automatically. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not 

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed by the GC. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower 

[jira] [Commented] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060824#comment-16060824
 ] 

Hive QA commented on HIVE-16892:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874234/HIVE-16892.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_11_managed_external]
 (batchId=66)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_export]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_12_nonnative_export]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testViewsReplication 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5747/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5747/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5747/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874234 - PreCommit-HIVE-Build

> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16892.1.patch, HIVE-16892.2.patch, 
> HIVE-16892.3.patch
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16924) Support distinct in presence Gby

2017-06-23 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16924:
---

Assignee: Remus Rusanu

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Remus Rusanu
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 

[jira] [Comment Edited] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060804#comment-16060804
 ] 

Remus Rusanu edited comment on HIVE-16888 at 6/23/17 12:22 PM:
---

{{druid_basic2}} falls with exception:

{noformat}
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: Created Table Plan for druid_table_1 TS[0]
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
exec.FunctionRegistry: Method didn't match: passed = [string] accepted = 
[timestamp] method = public org.apache.hadoop.hive.serde2.io.TimestampWritable 
org.apache.hadoop.hive.ql.udf.UDFDateFloor.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable)
2017-06-23T05:18:31,796 ERROR [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong arguments 
'extract': No matching method for class 
org.apache.hadoop.hive.ql.udf.UDFDateFloorDay with (string). Possible choices: 
_FUNC_(timestamp)  
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1363)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:229)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:176)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11746)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11701)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11669)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3325)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3305)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9695)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10652)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10530)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:433)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11269)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:294)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:169)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}
on this plan:
{noformat}
2017-06-23T05:18:31,790 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
translator.PlanModifierForASTConv: Final plan after modifier
 HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first], fetch=[10])
  HiveProject(robot=[$1], __time=[$0])
HiveFilter(condition=[BETWEEN(false, FLOOR_DAY($0, FLAG(DAY)), 
CAST(1999-11-01 08:00:00):TIMESTAMP(9), CAST(1999-11-10 
08:00:00):TIMESTAMP(9))])
  DruidQuery(table=[[default.druid_table_1]], 
intervals=[[1900-01-01T00:00:00.000/3000-01-01T00:00:00.000]], groups=[{0, 1}], 
aggs=[[]])
{noformat}


was (Author: rusanu):
{{druid_basic2}} falls with exception:


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060804#comment-16060804
 ] 

Remus Rusanu commented on HIVE-16888:
-

{{druid_basic2}} falls with exception:

{noformat}
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: Created Table Plan for druid_table_1 TS[0]
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
exec.FunctionRegistry: Method didn't match: passed = [string] accepted = 
[timestamp] method = public org.apache.hadoop.hive.serde2.io.TimestampWritable 
org.apache.hadoop.hive.ql.udf.UDFDateFloor.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable)
2017-06-23T05:18:31,796 ERROR [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong arguments 
'extract': No matching method for class 
org.apache.hadoop.hive.ql.udf.UDFDateFloorDay with (string). Possible choices: 
_FUNC_(timestamp)  
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1363)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:229)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:176)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11746)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11701)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11669)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3325)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3305)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9695)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10652)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10530)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:433)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11269)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:294)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:169)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060801#comment-16060801
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] can you tell if the druid* ptests diffs are ok in the 
latest test run?

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed by the GC. When queries spanning multiple partitions are made the 
> 

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads. They are not removed by the GC. When queries spanning multiple 
> partitions are made the number of threads increases and is never reduced. On 
> my machine hiveserver2 starts to get slower and slower once 10k threads are 
> reached.
> Thread pools should be should be 

[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060794#comment-16060794
 ] 

Remus Rusanu commented on HIVE-16888:
-

MiniLlapLocal/tez_smb_join fails with exception:
{noformat}
2017-06-23T05:07:55,395  INFO 
[TezTaskEventRouter{attempt_1498219646456_0001_38_01_03_0}] 
impl.LlapRecordReader: Received fragment id: 1498219646456_0001_38_01_03_0
2017-06-23T05:07:55,390  WARN 
[TezTaskEventRouter{attempt_1498219646456_0001_38_01_01_0}] 
runtime.LogicalIOProcessorRuntimeTask: Failed to handle event
java.lang.RuntimeException: java.io.IOException: java.io.IOException: 
java.io.IOException: cannot find dir = 
file:/Users/rrusanu/hive/itests/qtest/target/localfs/warehouse/tab/ds=2008-04-08/01_0
 in pathToPartitionInfo: 
[file:/Users/rrusanu/hive/itests/qtest/target/localfs/warehouse/tab_part/ds=2008-04-08]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.(MRReaderMapred.java:76) 
~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:195)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154) 
~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35) 
[tez-common-0.8.4.jar:0.8.4]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
{noformat}
I suspect this is caused by time offsets

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads. They are not removed by the GC. When queries spanning multiple 
> partitions are made the number of threads increases and is never reduced. On 
> my machine hiveserver2 starts to get slower and slower once 10k threads are 
> reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> 

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].




> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool 
> for which is not shutdown upon completion of its threads. This leads to a 
> leak of threads. They are not removed by the GC. When queries spanning 
> multiple partitions are made the number of threads increases and is never 
> reduced. On my machine hiveserver2 starts to get slower and slower once 10k 
> threads are reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
>  to make sure the threads are really shutdown. This, however, would only fix 
> normal operation. There are other exit points, namely through exceptions, 
> which would still lead to the same leak of threads.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> 

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].




> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool 
> for which is not shutdown upon completion of its threads. This leads to a 
> leak of threads. They are not removed by the GC. When queries spanning 
> multiple partitions are made the number of threads increases and is never 
> reduced. On my machine hiveserver2 starts to get slower and slower once 10k 
> threads are reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
>  to make sure the threads are really shutdown. This, however, would only fix 
> normal operation. There are other exit points, namely through exceptions, 
> which would still lead to the same leak of threads.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060772#comment-16060772
 ] 

Hive QA commented on HIVE-16888:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874230/HIVE-16888.05.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 105 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=229)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=229)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[mapjoin2] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[udf_unix_timestamp] 
(batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_on_constant] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join17] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_cross_product_check_2]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby2_map_multi_distinct]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_groupby3_noskew_multi_distinct]
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join0] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_date] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_4] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_date]
 (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_union] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_ppr_multi_distinct]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_3] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_filters_overlap] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_create_rewrite_multi_db]
 (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_char] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[outer_join_ppr] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_date] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_varchar] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_check] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_type_in_plan] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_union] (batchId=45)

[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060707#comment-16060707
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874192/HIVE-16832.09.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 51 failed/errored test(s), 10858 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_explode2] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_noalias] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_7] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_8] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_9] (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_acid_no_masking] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_stack] (batchId=36)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lateral_view]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=98)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udf_assert_true] 
(batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[lateral_view_explode2]
 (batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
 (batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testNewBaseAndDelta 
(batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta
 (batchId=262)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta
 (batchId=262)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.hcatalog.streaming.mutate.TestMutations.testMulti (batchId=190)

[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-23 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16892:
---
Attachment: HIVE-16892.3.patch

fixing regressions

> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
> Attachments: HIVE-16892.1.patch, HIVE-16892.2.patch, 
> HIVE-16892.3.patch
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.05.patch

Patch 05 user 1.13.0-RC0

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060632#comment-16060632
 ] 

Hive QA commented on HIVE-11297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874190/HIVE-11297.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5743/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5743/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5743/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874190 - PreCommit-HIVE-Build

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, 
> HIVE-11297.6.patch, HIVE-11297.7.patch, HIVE-11297.8.patch, hive-site.xml
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16929) User-defined UDF functions can be registered as invariant functions

2017-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060591#comment-16060591
 ] 

Hive QA commented on HIVE-16929:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874187/HIVE-16929.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union24] 
(batchId=125)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5742/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5742/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5742/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874187 - PreCommit-HIVE-Build

> User-defined UDF functions can be registered as invariant functions
> ---
>
> Key: HIVE-16929
> URL: https://issues.apache.org/jira/browse/HIVE-16929
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Attachments: HIVE-16929.1.patch, HIVE-16929.2.patch
>
>
> Add a configuration item "hive.aux.udf.package.name.list" in hive-site.xml, 
> which is a scan corresponding to the $HIVE_HOME/auxlib/ directory jar package 
> that contains the corresponding configuration package name under the class 
> registered as a constant function.
> Such as,
> {code:java}
> 
>   hive.aux.udf.package.name.list
>   com.sample.udf,com.test.udf
> 
> {code}
> Instructions:
>    1, upload your jar file to $ HIVE_HOME/auxlib
>    2, configure your UDF function corresponding to the package to the 
> following configuration parameters
> {code:java}
>
> hive.aux.udf.package.name.list
> com.sample.udf
>
>   {code}
>    3, the configuration items need to be placed in the hive-site.xml file
>    4, restart the Hive service to take effect



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >