[jira] [Assigned] (HIVE-21907) Add new method to enable/disable LlapNode

2019-06-24 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-21907:
-

Assignee: Peter Vary

> Add new method to enable/disable LlapNode
> -
>
> Key: HIVE-21907
> URL: https://issues.apache.org/jira/browse/HIVE-21907
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Add a new method to LlapManagementProtocol API which can disable an Llap node.
> It would be even better, if we can dynamically set the number of executors 
> and the size of the wait queue. This way we can disable the node setting them 
> to 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21907) Add a new LlapDaemon Management API method to set the daemon capacity

2019-06-24 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-21907:
--
Summary: Add a new LlapDaemon Management API method to set the daemon 
capacity  (was: Add new method to enable/disable LlapNode)

> Add a new LlapDaemon Management API method to set the daemon capacity
> -
>
> Key: HIVE-21907
> URL: https://issues.apache.org/jira/browse/HIVE-21907
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Add a new method to LlapManagementProtocol API which can disable an Llap node.
> It would be even better, if we can dynamically set the number of executors 
> and the size of the wait queue. This way we can disable the node setting them 
> to 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21905) Generics improvement around the FetchOperator class

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871187#comment-16871187
 ] 

Hive QA commented on HIVE-21905:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12972700/HIVE-21905.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 16339 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17706/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17706/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17706/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12972700 - PreCommit-HIVE-Build

> Generics improvement around the FetchOperator class
> ---
>
> Key: HIVE-21905
> URL: https://issues.apache.org/jira/browse/HIVE-21905
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ivan Suller
>Assignee: Ivan Suller
>Priority: Minor
> Attachments: HIVE-21905.1.patch, HIVE-21905.1.patch, 
> HIVE-21905.2.patch
>
>
> In and around the org.apache.hadoop.hive.ql.exec.FetchOperator class the 
> generics are handled poorly. Lot's of declarations are missing generics, 
> which makes lots of noise in the IDE and makes it hard to be sure of the 
> correctness of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to bigint at the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Description: 
The ceil, ceiling and floor SQL functions return type is bigint and this leads 
to overflow:
{code:java}
hive> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
OK
4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c921329223372036854775807 
   92233720368547758079223372036854775807
{code}
The explain returned:
{code:java}
++
| Explain |
++
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: _dummy_table |
| Row Limit Per Split: 1 |
| Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: 
COMPLETE |
| Select Operator |
| expressions: '4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132' 
(type: string), 9223372036854775807L (type: bigint), 9223372036854775807L 
(type: bigint), 9223372036854775807L (type: bigint) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 1 Data size: 164 Basic stats: COMPLETE Column stats: 
COMPLETE |
| ListSink |
| |
++
{code}
Meanwhile at other SQL engines.

*PostgreSQL:*
{code:java}
postgres=# select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
ceiling | floor 
--++---
 
-
 
--+---
 

 PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
12345678901234
 
000
 | 
1234567890123400
 
0
 | 
12345678901234
 
000
 (1 row)
{code}
*MySQL:*
  
{code:java}
mysql> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
+---+---+---+---+
 | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
floor(1.2345678901234e+200) | 
+---+---+---+---+
 | 5.7.26 | 
12345678901234000
 | 

[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to bigint at the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Description: 
The ceil, ceiling and floor SQL functions return type is bigint and this leads 
to overflow:
{code:java}
hive> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
OK
4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c921329223372036854775807 
   92233720368547758079223372036854775807
{code}
The explain returned:
{code}
++
| Explain |
++
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: _dummy_table |
| Row Limit Per Split: 1 |
| Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: 
COMPLETE |
| Select Operator |
| expressions: '4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132' 
(type: string), 9223372036854775807L (type: bigint), 9223372036854775807L 
(type: bigint), 9223372036854775807L (type: bigint) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 1 Data size: 164 Basic stats: COMPLETE Column stats: 
COMPLETE |
| ListSink |
| |
++
{code}
Meanwhile at other SQL engines.

*PostgreSQL:*
{code:java}
postgres=# select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
ceiling | floor 
--++---
 
-
 
--+---
 

 PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
12345678901234
 
000
 | 
1234567890123400
 
0
 | 
12345678901234
 
000
 (1 row)
{code}
*MySQL:*
  
{code:java}
mysql> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
+---+---+---+---+
 | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
floor(1.2345678901234e+200) | 
+---+---+---+---+
 | 5.7.26 | 
12345678901234000
 | 

[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to bigint at the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Description: 
The ceil, ceiling and floor SQL functions return type is bigint and this leads 
to overflow:
{code:java}
hive> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
OK
4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c921329223372036854775807 
   92233720368547758079223372036854775807
{code}
The explain returned:


{code}
expressions: '4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132' (type: 
string), 9223372036854775807L (type: bigint), 9223372036854775807L (type: 
bigint), 9223372036854775807L (type: bigint)
{code}

 Meanwhile at other SQL engines.

*PostgreSQL:*
{code:java}
postgres=# select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
ceiling | floor 
--++---
 
-
 
--+---
 

 PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
12345678901234
 
000
 | 
1234567890123400
 
0
 | 
12345678901234
 
000
 (1 row)
{code}
*MySQL:*
  
{code:java}
mysql> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
+---+---+---+---+
 | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
floor(1.2345678901234e+200) | 
+---+---+---+---+
 | 5.7.26 | 
12345678901234000
 | 
12345678901234000
 | 
12345678901234000
 | 

[jira] [Updated] (HIVE-21905) Generics improvement around the FetchOperator class

2019-06-24 Thread Ivan Suller (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Suller updated HIVE-21905:
---
Attachment: HIVE-21905.2.patch

> Generics improvement around the FetchOperator class
> ---
>
> Key: HIVE-21905
> URL: https://issues.apache.org/jira/browse/HIVE-21905
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ivan Suller
>Assignee: Ivan Suller
>Priority: Minor
> Attachments: HIVE-21905.1.patch, HIVE-21905.1.patch, 
> HIVE-21905.2.patch
>
>
> In and around the org.apache.hadoop.hive.ql.exec.FetchOperator class the 
> generics are handled poorly. Lot's of declarations are missing generics, 
> which makes lots of noise in the IDE and makes it hard to be sure of the 
> correctness of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to bigint at the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Summary: Avoid overflow as a result of casting to bigint at the "ceil", 
"ceiling" and "floor" SQL functions  (was: Avoid overflow as a result of 
casting to long at the "ceil", "ceiling" and "floor" SQL functions)

> Avoid overflow as a result of casting to bigint at the "ceil", "ceiling" and 
> "floor" SQL functions
> --
>
> Key: HIVE-21916
> URL: https://issues.apache.org/jira/browse/HIVE-21916
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> The ceil, ceiling and floor SQL functions return type is long and this leads 
> to overflow:
> {code}
> hive> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
> OK
> 4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132
> 922337203685477580792233720368547758079223372036854775807
> {code}
>  
> Meanwhile at other SQL engines.
> *PostgreSQL:*
> {code}
> postgres=# select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
> ceiling | floor 
> --++---
>  
> -
>  
> --+---
>  
> 
>  PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
> 12345678901234
>  
> 000
>  | 
> 1234567890123400
>  
> 0
>  | 
> 12345678901234
>  
> 000
>  (1 row)
> {code}
> *MySQL:*
>  
> {code}
> mysql> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
> +---+---+---+---+
>  | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
> floor(1.2345678901234e+200) | 
> +---+---+---+---+
>  | 5.7.26 | 
> 12345678901234000
>  | 
> 

[jira] [Commented] (HIVE-18735) Create table like loses transactional attribute

2019-06-24 Thread Laszlo Pinter (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871185#comment-16871185
 ] 

Laszlo Pinter commented on HIVE-18735:
--

[~ekoifman] I provided a patch for this issue. I saw that,  HIVE-18983 tried to 
address this problem as well, by copying all table properties. That patch is 
still in progress, and based on the review comments it is not likely will be 
merged in. I solved the current issue, by copying over just 3 properties 
(is_transactional, transactional_properties, bucketing_version), which were 
missing from the target table. I would appreciate, if you could review my 
patch. Thanks

> Create table like loses transactional attribute
> ---
>
> Key: HIVE-18735
> URL: https://issues.apache.org/jira/browse/HIVE-18735
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-18735.01.patch
>
>
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1;
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>  
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518813536099/warehouse/t'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1518813564')
> {noformat}
> Specifying props explicitly does work 
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1 TBLPROPERTIES ('transactional'='true');
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518814098564/warehouse/t'
> TBLPROPERTIES (
>   'transactional'='true',
>   'transactional_properties'='default',
>   'transient_lastDdlTime'='1518814111')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Attachment: HIVE-21915.01.patch
Status: Patch Available  (was: Open)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Attachment: (was: HIVE-21915.01.patch)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Status: Open  (was: Patch Available)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Attachment: (was: hive-21915-2019-06-24.patch)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Attachment: HIVE-21915.01.patch

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18735) Create table like loses transactional attribute

2019-06-24 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-18735:
-
Attachment: HIVE-18735.01.patch

> Create table like loses transactional attribute
> ---
>
> Key: HIVE-18735
> URL: https://issues.apache.org/jira/browse/HIVE-18735
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Laszlo Pinter
>Priority: Major
> Attachments: HIVE-18735.01.patch
>
>
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1;
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>  
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518813536099/warehouse/t'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1518813564')
> {noformat}
> Specifying props explicitly does work 
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1 TBLPROPERTIES ('transactional'='true');
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518814098564/warehouse/t'
> TBLPROPERTIES (
>   'transactional'='true',
>   'transactional_properties'='default',
>   'transient_lastDdlTime'='1518814111')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18735) Create table like loses transactional attribute

2019-06-24 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter updated HIVE-18735:
-
Target Version/s: 4.0.0  (was: 3.0.0)

> Create table like loses transactional attribute
> ---
>
> Key: HIVE-18735
> URL: https://issues.apache.org/jira/browse/HIVE-18735
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Laszlo Pinter
>Priority: Major
>
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1;
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>  
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518813536099/warehouse/t'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1518813564')
> {noformat}
> Specifying props explicitly does work 
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1 TBLPROPERTIES ('transactional'='true');
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518814098564/warehouse/t'
> TBLPROPERTIES (
>   'transactional'='true',
>   'transactional_properties'='default',
>   'transient_lastDdlTime'='1518814111')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18735) Create table like loses transactional attribute

2019-06-24 Thread Laszlo Pinter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Pinter reassigned HIVE-18735:


Assignee: Laszlo Pinter

> Create table like loses transactional attribute
> ---
>
> Key: HIVE-18735
> URL: https://issues.apache.org/jira/browse/HIVE-18735
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Eugene Koifman
>Assignee: Laszlo Pinter
>Priority: Major
>
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1;
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>  
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518813536099/warehouse/t'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1518813564')
> {noformat}
> Specifying props explicitly does work 
> {noformat}
> create table T1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true')";
> create table T like T1 TBLPROPERTIES ('transactional'='true');
> show create table T ;
> CREATE TABLE `T`(
>   `a` int,
>   `b` int)
> CLUSTERED BY (
>   a)
> INTO 2 BUCKETS
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
>   
> 'file:/Users/ekoifman/IdeaProjects/hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands-1518814098564/warehouse/t'
> TBLPROPERTIES (
>   'transactional'='true',
>   'transactional_properties'='default',
>   'transient_lastDdlTime'='1518814111')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to long at the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Summary: Avoid overflow as a result of casting to long at the "ceil", 
"ceiling" and "floor" SQL functions  (was: Avoid overflow as a result of 
casting to long for the "ceil", "ceiling" and "floor" SQL functions)

> Avoid overflow as a result of casting to long at the "ceil", "ceiling" and 
> "floor" SQL functions
> 
>
> Key: HIVE-21916
> URL: https://issues.apache.org/jira/browse/HIVE-21916
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> The ceil, ceiling and floor SQL functions return type is long and this leads 
> to overflow:
> {code}
> hive> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
> OK
> 4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132
> 922337203685477580792233720368547758079223372036854775807
> {code}
>  
> Meanwhile at other SQL engines.
> *PostgreSQL:*
> {code}
> postgres=# select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
> ceiling | floor 
> --++---
>  
> -
>  
> --+---
>  
> 
>  PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
> 12345678901234
>  
> 000
>  | 
> 1234567890123400
>  
> 0
>  | 
> 12345678901234
>  
> 000
>  (1 row)
> {code}
> *MySQL:*
>  
> {code}
> mysql> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
> +---+---+---+---+
>  | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
> floor(1.2345678901234e+200) | 
> +---+---+---+---+
>  | 5.7.26 | 
> 12345678901234000
>  | 
> 12345678901234000
> 

[jira] [Updated] (HIVE-21916) Avoid overflow as a result of casting to long for the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Summary: Avoid overflow as a result of casting to long for the "ceil", 
"ceiling" and "floor" SQL functions  (was: Avoid overflow because of casting in 
case of the "ceil", "ceiling" and "floor" SQL functions)

> Avoid overflow as a result of casting to long for the "ceil", "ceiling" and 
> "floor" SQL functions
> -
>
> Key: HIVE-21916
> URL: https://issues.apache.org/jira/browse/HIVE-21916
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> The ceil, ceiling and floor SQL functions return type is long and this leads 
> to overflow:
> {code}
> hive> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
> OK
> 4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c92132
> 922337203685477580792233720368547758079223372036854775807
> {code}
>  
> Meanwhile at other SQL engines.
> *PostgreSQL:*
> {code}
> postgres=# select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
> ceiling | floor 
> --++---
>  
> -
>  
> --+---
>  
> 
>  PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
> gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
> 12345678901234
>  
> 000
>  | 
> 1234567890123400
>  
> 0
>  | 
> 12345678901234
>  
> 000
>  (1 row)
> {code}
> *MySQL:*
>  
> {code}
> mysql> select version(), ceil(1.2345678901234e+200), 
> ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
> +---+---+---+---+
>  | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
> floor(1.2345678901234e+200) | 
> +---+---+---+---+
>  | 5.7.26 | 
> 12345678901234000
>  | 
> 12345678901234000
>  

[jira] [Updated] (HIVE-21916) Avoid overflow because of casting in case of the "ceil", "ceiling" and "floor" SQL functions

2019-06-24 Thread Attila Zsolt Piros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros updated HIVE-21916:
--
Description: 
The ceil, ceiling and floor SQL functions return type is long and this leads to 
overflow:
{code}
hive> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200);
OK
4.0.0-SNAPSHOT r11f78562ab36333cc1d0a3f6051d9846c9c921329223372036854775807 
   92233720368547758079223372036854775807
{code}
 
Meanwhile at other SQL engines.

*PostgreSQL:*
{code}
postgres=# select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); version | ceil | 
ceiling | floor 
--++---
 
-
 
--+---
 

 PostgreSQL 11.3 (Debian 11.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 
12345678901234
 
000
 | 
1234567890123400
 
0
 | 
12345678901234
 
000
 (1 row)
{code}

*MySQL:*
 
{code}
mysql> select version(), ceil(1.2345678901234e+200), 
ceiling(1.2345678901234e+200), floor(1.2345678901234e+200); 
+---+---+---+---+
 | version() | ceil(1.2345678901234e+200) | ceiling(1.2345678901234e+200) | 
floor(1.2345678901234e+200) | 
+---+---+---+---+
 | 5.7.26 | 
12345678901234000
 | 
12345678901234000
 | 
12345678901234000
 | 

[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Status: Patch Available  (was: Open)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Component/s: Query Processor

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true

2019-06-24 Thread philipse (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871122#comment-16871122
 ] 

philipse commented on HIVE-5888:


hi,I still encounter this issue on hive1.1, after 
set hive.auto.convert.join = false; the result will show ok,others no result 
will appear, any new workaround on it now ?

> group by after join operation product no result when  hive.optimize.skewjoin 
> = true 
> 
>
> Key: HIVE-5888
> URL: https://issues.apache.org/jira/browse/HIVE-5888
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0
>Reporter: cyril liao
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Labels:   (was: patch)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Labels: patch  (was: pull-request-available)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: patch
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871119#comment-16871119
 ] 

Wei Zhang commented on HIVE-21915:
--

Just added a patch for this issue. Anybody help to review the code?

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-21915:
-
Attachment: hive-21915-2019-06-24.patch

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: hive-21915-2019-06-24.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21905) Generics improvement around the FetchOperator class

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871115#comment-16871115
 ] 

Hive QA commented on HIVE-21905:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
10s{color} | {color:blue} ql in master has 2254 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 1 new + 152 unchanged - 1 
fixed = 153 total (was 153) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
15s{color} | {color:green} ql generated 0 new + 2253 unchanged - 1 fixed = 2253 
total (was 2254) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17706/dev-support/hive-personality.sh
 |
| git revision | master / 11f7856 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17706/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17706/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Generics improvement around the FetchOperator class
> ---
>
> Key: HIVE-21905
> URL: https://issues.apache.org/jira/browse/HIVE-21905
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ivan Suller
>Assignee: Ivan Suller
>Priority: Minor
> Attachments: HIVE-21905.1.patch, HIVE-21905.1.patch
>
>
> In and around the org.apache.hadoop.hive.ql.exec.FetchOperator class the 
> generics are handled poorly. Lot's of declarations are missing generics, 
> which makes lots of noise in the IDE and makes it hard to be sure of the 
> correctness of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871097#comment-16871097
 ] 

Hive QA commented on HIVE-21225:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12972698/HIVE-21225.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 64 failed/errored test(s), 16339 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_nullscan] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_stats5] (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_nonpart] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_part2] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_part] (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_sizebug] 
(batchId=89)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] 
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mm_exim] 
(batchId=186)
org.apache.hadoop.hive.ql.TestTxnCommands.testMmExim (batchId=341)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion1 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion2 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion3 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnCommands2.testOriginalFileReaderWhenNonAcidConvertedToAcid
 (batchId=322)
org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned (batchId=322)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion1
 (batchId=336)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion2
 (batchId=336)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion3
 (batchId=336)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOriginalFileReaderWhenNonAcidConvertedToAcid
 (batchId=336)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.updateDeletePartitioned
 (batchId=336)
org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testMmExim
 (batchId=322)
org.apache.hadoop.hive.ql.TestTxnExIm.testImport (batchId=322)
org.apache.hadoop.hive.ql.TestTxnExIm.testImportNoTarget (batchId=322)
org.apache.hadoop.hive.ql.TestTxnExIm.testMM (batchId=322)
org.apache.hadoop.hive.ql.TestTxnExIm.testMMCreate (batchId=322)
org.apache.hadoop.hive.ql.TestTxnLoadData.loadData (batchId=298)
org.apache.hadoop.hive.ql.TestTxnLoadData.loadDataNonAcid2AcidConversion 
(batchId=298)
org.apache.hadoop.hive.ql.TestTxnLoadData.loadDataUpdate (batchId=298)
org.apache.hadoop.hive.ql.TestTxnLoadData.testMultiStatement (batchId=298)
org.apache.hadoop.hive.ql.TestTxnNoBuckets.testCompactStatsGather (batchId=322)
org.apache.hadoop.hive.ql.TestTxnNoBuckets.testEmptyCompactionResult 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnNoBuckets.testToAcidConversionMultiBucket 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testCompactStatsGather 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testEmptyCompactionResult 
(batchId=322)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testToAcidConversionMultiBucket
 (batchId=322)
org.apache.hadoop.hive.ql.io.TestAcidUtils.testBaseDeltas (batchId=310)
org.apache.hadoop.hive.ql.io.TestAcidUtils.testBaseWithDeleteDeltas 
(batchId=310)
org.apache.hadoop.hive.ql.io.TestAcidUtils.testObsoleteOriginals (batchId=310)
org.apache.hadoop.hive.ql.io.TestAcidUtils.testOriginal (batchId=310)
org.apache.hadoop.hive.ql.io.TestAcidUtils.testOriginalDeltas (batchId=310)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testACIDReaderFooterSerialize
 (batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testACIDReaderFooterSerializeWithDeltas
 (batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testACIDReaderNoFooterSerialize
 (batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testACIDReaderNoFooterSerializeWithDeltas
 (batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testEmptyFile 
(batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testSplitGenReadOps 
(batchId=313)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testSplitGenReadOpsLocalCache
 (batchId=313)

[jira] [Assigned] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang reassigned HIVE-21915:


Assignee: Wei Zhang  (was: Ning Zhang)

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>  Labels: pull-request-available
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-24 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21880:
--
Attachment: HIVE-21880.01.patch
Status: Patch Available  (was: Open)

The code in getNextNotification() just checks whether the next event has the 
expected event id. This check may fail when there are multiple events with the 
same event id or when event ids are missing. When the test fails, it fails 
because there multiple events with the same event id.

We use derby database as backing db for metastore. Derby doesn't lock the row 
being selected with FOR UPDATE clause. addNotificationLog() and 
addNotificationEvent(), both functions, rely on the this behaviour to generate 
monotonically increasing sequential event ids. Since the row is not locked, we 
could fetch the same event id multiple times and then increment it to the same 
value multiple times. That can cause the event ids to progress in unreliable 
manner. So for Derby we lock the NOTIFICATION_SEQUENCE table instead of using 
FOR UPDATE.

Note: TxnHandler uses a different behaviour to simulate the effect of FOR 
UPDATE on Derby; it uses a JVM wide mutex for that. TxnHandler is not available 
always esp. when there are no ACID tables involved, so we need to move that 
mutex out of TxnHandler to a place common to DbNotificationListener and 
TxnHandler e.g. SQLGenerater and also have to take care of mutex's reentrant 
behaviour. Furthermore such a mutex wouldn't work when there are metastores are 
running in separate JVMs.

Since the test in Subject is flaky, I have added another test which reliably 
reproduces this behaviour.

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21880.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> 

[jira] [Updated] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21880:
--
Labels: pull-request-available  (was: )

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> 

[jira] [Work logged] (HIVE-21880) Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.

2019-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21880?focusedWorklogId=265661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-265661
 ]

ASF GitHub Bot logged work on HIVE-21880:
-

Author: ASF GitHub Bot
Created on: 24/Jun/19 11:12
Start Date: 24/Jun/19 11:12
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #684: 
HIVE-21880 : Enable flaky test 
TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
URL: https://github.com/apache/hive/pull/684
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 265661)
Time Spent: 10m
Remaining Estimate: 0h

> Enable flaky test 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
> ---
>
> Key: HIVE-21880
> URL: https://issues.apache.org/jira/browse/HIVE-21880
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Need tp enable 
> TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites
>  which is disabled as it is flaky and randomly failing with below error.
> {code}
> Error Message
> Notification events are missing in the meta store.
> Stacktrace
> java.lang.IllegalStateException: Notification events are missing in the meta 
> store.
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getNextNotification(HiveMetaStoreClient.java:3246)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy58.getNextNotification(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getNextNotificationEvents(EventUtils.java:107)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.fetchNextBatch(EventUtils.java:159)
>   at 
> org.apache.hadoop.hive.ql.metadata.events.EventUtils$NotificationEventIterator.hasNext(EventUtils.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:231)
>   at 
> org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:121)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2709)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2361)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2028)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1788)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1782)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.run(WarehouseInstance.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:282)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:265)
>   at 
> org.apache.hadoop.hive.ql.parse.WarehouseInstance.dump(WarehouseInstance.java:289)
>   at 
> org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites(TestReplicationScenariosAcidTablesBootstrap.java:328)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871068#comment-16871068
 ] 

Hive QA commented on HIVE-21225:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
14s{color} | {color:blue} ql in master has 2254 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
44s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 16 new + 169 unchanged - 1 
fixed = 185 total (was 170) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 26 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
27s{color} | {color:red} ql generated 1 new + 2254 unchanged - 0 fixed = 2255 
total (was 2254) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unread field:AcidUtils.java:[line 1398] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17705/dev-support/hive-personality.sh
 |
| git revision | master / 11f7856 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17705/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17705/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17705/yetus/new-findbugs-ql.html
 |
| modules | C: ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17705/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: HIVE-21225.1.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single 

[jira] [Updated] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21914:
--
Attachment: HIVE-21914.03.patch

> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch, 
> HIVE-21914.03.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while they belong to the DDL framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21914:
--
Attachment: (was: HIVE-21914.03.patch)

> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch, 
> HIVE-21914.03.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while they belong to the DDL framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871048#comment-16871048
 ] 

Hive QA commented on HIVE-21914:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12972697/HIVE-21914.03.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 16335 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.mapreduce.TestHCatPartitioned.testHCatPartitionedTable[7]
 (batchId=211)
org.apache.hive.service.cli.thrift.TestThriftHttpCLIServiceFeatures.org.apache.hive.service.cli.thrift.TestThriftHttpCLIServiceFeatures
 (batchId=270)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/17704/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17704/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17704/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12972697 - PreCommit-HIVE-Build

> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch, 
> HIVE-21914.03.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while they belong to the DDL framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss

2019-06-24 Thread Wei Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang reassigned HIVE-21915:


Assignee: Ning Zhang

> Hive with TEZ UNION ALL and UDTF results in data loss
> -
>
> Key: HIVE-21915
> URL: https://issues.apache.org/jira/browse/HIVE-21915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Wei Zhang
>Assignee: Ning Zhang
>Priority: Major
>  Labels: pull-request-available
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 
> appear. In our case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp 
> paths. And that results from when UDTF is present, the FileSinkOperator will 
> be processed twice generating the tmp path in 
> GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21846) Create a thread in TezAM which periodically fetches LlapDaemon metrics

2019-06-24 Thread Antal Sinkovits (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871031#comment-16871031
 ] 

Antal Sinkovits commented on HIVE-21846:


[~pvary] [~odraese] could you please review the changes.

> Create a thread in TezAM which periodically fetches LlapDaemon metrics
> --
>
> Key: HIVE-21846
> URL: https://issues.apache.org/jira/browse/HIVE-21846
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, Tez
>Reporter: Peter Vary
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21846.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LlapTaskSchedulerService should start a thread which periodically fetches the 
> LlapDaemon metrics and stores them in the NodeInfo object.
> This should be just the first implementation - later we should find a way 
> where we do not need NxM requests between N TezAM and M LlapDaemon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21846) Create a thread in TezAM which periodically fetches LlapDaemon metrics

2019-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21846?focusedWorklogId=265635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-265635
 ]

ASF GitHub Bot logged work on HIVE-21846:
-

Author: ASF GitHub Bot
Created on: 24/Jun/19 09:51
Start Date: 24/Jun/19 09:51
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on pull request #683: HIVE-21846: 
Create a thread in TezAM which periodically fetches LlapDaemon metrics
URL: https://github.com/apache/hive/pull/683
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 265635)
Time Spent: 10m
Remaining Estimate: 0h

> Create a thread in TezAM which periodically fetches LlapDaemon metrics
> --
>
> Key: HIVE-21846
> URL: https://issues.apache.org/jira/browse/HIVE-21846
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, Tez
>Reporter: Peter Vary
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21846.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LlapTaskSchedulerService should start a thread which periodically fetches the 
> LlapDaemon metrics and stores them in the NodeInfo object.
> This should be just the first implementation - later we should find a way 
> where we do not need NxM requests between N TezAM and M LlapDaemon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21846) Create a thread in TezAM which periodically fetches LlapDaemon metrics

2019-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21846:
--
Labels: pull-request-available  (was: )

> Create a thread in TezAM which periodically fetches LlapDaemon metrics
> --
>
> Key: HIVE-21846
> URL: https://issues.apache.org/jira/browse/HIVE-21846
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, Tez
>Reporter: Peter Vary
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21846.01.patch
>
>
> LlapTaskSchedulerService should start a thread which periodically fetches the 
> LlapDaemon metrics and stores them in the NodeInfo object.
> This should be just the first implementation - later we should find a way 
> where we do not need NxM requests between N TezAM and M LlapDaemon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871016#comment-16871016
 ] 

Hive QA commented on HIVE-21914:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
9s{color} | {color:blue} ql in master has 2254 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} llap-server in master has 82 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 1 new + 326 unchanged - 17 
fixed = 327 total (was 343) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
26s{color} | {color:red} ql generated 1 new + 2253 unchanged - 1 fixed = 2254 
total (was 2254) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Should org.apache.hadoop.hive.ql.parse.HiveParser$DFA238 be a _static_ 
inner class?  At HiveParser.java:inner class?  At HiveParser.java:[lines 
48391-48404] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-17704/dev-support/hive-personality.sh
 |
| git revision | master / 11f7856 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17704/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17704/yetus/new-findbugs-ql.html
 |
| modules | C: ql llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-17704/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch, 
> HIVE-21914.03.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while 

[jira] [Updated] (HIVE-21846) Create a thread in TezAM which periodically fetches LlapDaemon metrics

2019-06-24 Thread Antal Sinkovits (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-21846:
---
Attachment: HIVE-21846.01.patch

> Create a thread in TezAM which periodically fetches LlapDaemon metrics
> --
>
> Key: HIVE-21846
> URL: https://issues.apache.org/jira/browse/HIVE-21846
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, Tez
>Reporter: Peter Vary
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-21846.01.patch
>
>
> LlapTaskSchedulerService should start a thread which periodically fetches the 
> LlapDaemon metrics and stores them in the NodeInfo object.
> This should be just the first implementation - later we should find a way 
> where we do not need NxM requests between N TezAM and M LlapDaemon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21846) Create a thread in TezAM which periodically fetches LlapDaemon metrics

2019-06-24 Thread Antal Sinkovits (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-21846:
---
Status: Patch Available  (was: Open)

> Create a thread in TezAM which periodically fetches LlapDaemon metrics
> --
>
> Key: HIVE-21846
> URL: https://issues.apache.org/jira/browse/HIVE-21846
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, Tez
>Reporter: Peter Vary
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-21846.01.patch
>
>
> LlapTaskSchedulerService should start a thread which periodically fetches the 
> LlapDaemon metrics and stores them in the NodeInfo object.
> This should be just the first implementation - later we should find a way 
> where we do not need NxM requests between N TezAM and M LlapDaemon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21905) Generics improvement around the FetchOperator class

2019-06-24 Thread Ivan Suller (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Suller updated HIVE-21905:
---
Attachment: HIVE-21905.1.patch

> Generics improvement around the FetchOperator class
> ---
>
> Key: HIVE-21905
> URL: https://issues.apache.org/jira/browse/HIVE-21905
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ivan Suller
>Assignee: Ivan Suller
>Priority: Minor
> Attachments: HIVE-21905.1.patch, HIVE-21905.1.patch
>
>
> In and around the org.apache.hadoop.hive.ql.exec.FetchOperator class the 
> generics are handled poorly. Lot's of declarations are missing generics, 
> which makes lots of noise in the IDE and makes it hard to be sure of the 
> correctness of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-06-24 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-21225:

Attachment: (was: HIVE-21225.3.patch)

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: HIVE-21225.1.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-06-24 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-21225:

Attachment: HIVE-21225.3.patch

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: HIVE-21225.1.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21225) ACID: getAcidState() should cache a recursive dir listing locally

2019-06-24 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-21225:

Status: Patch Available  (was: Open)

> ACID: getAcidState() should cache a recursive dir listing locally
> -
>
> Key: HIVE-21225
> URL: https://issues.apache.org/jira/browse/HIVE-21225
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: HIVE-21225.1.patch, HIVE-21225.2.patch, 
> HIVE-21225.3.patch, async-pid-44-2.svg
>
>
> Currently getAcidState() makes 3 calls into the FS api which could be 
> answered by making a single recursive listDir call and reusing the same data 
> to check for isRawFormat() and isValidBase().
> All delta operations for a single partition can go against a single listed 
> directory snapshot instead of interacting with the NameNode or ObjectStore 
> within the inner loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21914:
--
Attachment: HIVE-21914.03.patch

> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch, 
> HIVE-21914.03.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while they belong to the DDL framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21914) Move Function and Macro related DDL operations into the DDL framework

2019-06-24 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870945#comment-16870945
 ] 

Zoltan Haindrich commented on HIVE-21914:
-

+1

> Move Function and Macro related DDL operations into the DDL framework
> -
>
> Key: HIVE-21914
> URL: https://issues.apache.org/jira/browse/HIVE-21914
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21914.01.patch, HIVE-21914.02.patch
>
>
> Some Function and Macro related operations are handled by FunctionTask, and 
> FunctionWork while they belong to the DDL framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21896) SHOW FUNCTIONS / SHOW FUNCTIONS LIKE - clarify

2019-06-24 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870916#comment-16870916
 ] 

Zoltan Haindrich commented on HIVE-21896:
-

+1

> SHOW FUNCTIONS / SHOW FUNCTIONS LIKE - clarify
> --
>
> Key: HIVE-21896
> URL: https://issues.apache.org/jira/browse/HIVE-21896
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: incompatibleChange, todoc4.0
> Fix For: 4.0.0
>
> Attachments: HIVE-21896.01.patch, HIVE-21896.02.patch
>
>
> According to 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions]
>  the currently available functions can be listed like this:
> {code:java}
> SHOW FUNCTIONS ;{code}
> If the user executes this command, they will get the correct list of 
> functions, but they will also see this on the standard output:
> {code:java}
> SHOW FUNCTIONS is deprecated, please use SHOW FUNCTIONS LIKE instead.{code}
> If the user uses the
> {code:java}
> SHOW FUNCTIONS LIKE ;{code}
> command then they will receive the exact same result (though through 
> different codes). The only difference is that one can get all the function 
> names with "SHOW FUNCTIONS;", while "SHOW FUNCTIONS LIKE;" returns an 
> exception, so in this case the pattern is mandatory.
> So there should be a decision if we still accept "SHOW FUNCTIONS" without the 
> "LIKE". My suggestion is to accept it only if there is no pattern. so "SHOW 
> FUNCTIONS;" is ok, without deprecation message, but "SHOW FUNCTIONS 
> " should throw an exception.
> Whatever we decide, we should document it appropriately.
> cc [~krishahn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2