date:20170213

[jira] [Updated] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-13 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15859:
--
Status: Patch Available  (was: Open)

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>Assignee: Rui Li
> Attachments: HIVE-15859.1.patch
>
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in 
> stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 
> 3.0 (TID 2417)
>

[jira] [Updated] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-13 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15859:
--
Attachment: HIVE-15859.1.patch

Patch v1 based on the Livy PR.
[~KaiXu], could you test if the patch fixes your problem? Thanks.

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>Assignee: Rui Li
> Attachments: HIVE-15859.1.patch
>
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in 
> stage 3.0 (TID 2515)
>

[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865281#comment-15865281
 ] 

Harsh J commented on HIVE-15908:


(H/T to Lingesh Radhakrishnan for the discovery)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Affects Version/s: 0.13.0

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-02-13 Thread Simanchal Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-15229:
-
Attachment: HIVE-15229.3.patch

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-02-13 Thread Simanchal Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-15229:
-
Status: Patch Available  (was: Open)

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch, 
> HIVE-15229.3.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-02-13 Thread Simanchal Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simanchal Das updated HIVE-15229:
-
Status: Open  (was: Patch Available)

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two operator in hive then so many scripts will be 
> migrated smoothly instead of converting these operators to multiple like 
> operators.
> Result:
> 1. 'LIKE ANY' operator return true if a text(column value) matches to any 
> pattern.
> 2. 'LIKE ALL' operator return true if a text(column value) matches to all 
> patterns.
> 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the 
> left hand side is NULL, but also if one of the pattern in the list is NULL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Status: Patch Available  (was: Open)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-15908:
---
Attachment: HIVE-15908.000.patch

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HIVE-15908:
--


> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-13 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-15859:
-

Assignee: Rui Li

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>Assignee: Rui Li
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in 
> stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 
> 3.0 (TID 2417)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed

[jira] [Commented] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865249#comment-15865249
 ] 

Hive QA commented on HIVE-15906:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852496/HIVE-15906.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10238 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3530/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3530/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3530/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852496 - PreCommit-HIVE-Build

> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15906.1.patch, HIVE-15906.2.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-13 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865239#comment-15865239
 ] 

Rui Li edited comment on HIVE-15859 at 2/14/17 7:11 AM:


Hi [~vanzin], thanks for providing the Livy PR. That's exactly the same issue I 
meant in my last comment. I'm wondering, if the message order gets messed like 
the PR mentioned:
* Send call header
* Send reply header
* Send reply payload
* Send call payload

It means the receiver will receive two successive message headers. And it 
should happen before we hit the issue here. Not sure why it doesn't cause any 
trouble.
Anyway I think Hive needs the same fix, and we can test if it fixes this one.


was (Author: lirui):
Hi [~vanzin], thanks for providing the Livy PR. That's exactly the same issue I 
meant in my last comment. I'm wondering, if the message order gets messed like 
the PR mentioned:
* Send call header
* Send reply header
* Send reply payload
* Send call payload
It means the receiver will receive two successive message headers. And it 
should happen before we hit the issue here. Not sure why it doesn't cause any 
trouble.
Anyway I think Hive needs the same fix, and we can test if it fixes this one.

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager:

[jira] [Commented] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

2017-02-13 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865239#comment-15865239
 ] 

Rui Li commented on HIVE-15859:
---

Hi [~vanzin], thanks for providing the Livy PR. That's exactly the same issue I 
meant in my last comment. I'm wondering, if the message order gets messed like 
the PR mentioned:
* Send call header
* Send reply header
* Send reply payload
* Send call payload
It means the receiver will receive two successive message headers. And it 
should happen before we hit the issue here. Not sure why it doesn't cause any 
trouble.
Anyway I think Hive needs the same fix, and we can test if it fixes this one.

> Hive client side shows Spark Driver disconnected while Spark Driver side 
> could not get RPC header 
> --
>
> Key: HIVE-15859
> URL: https://issues.apache.org/jira/browse/HIVE-15859
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>Reporter: KaiXu
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 
> Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 
> 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
> channel is closed.)'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown 
> reason, but hive's log shows Driver could not get RPC header( Expected RPC 
> header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in 
> stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver 
> commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = 
> hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
> (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
> 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in 
> stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
> Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
> /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
>

[jira] [Commented] (HIVE-15847) In Progress update refreshes seem slow

2017-02-13 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865203#comment-15865203
 ] 

anishek commented on HIVE-15847:


when the server logs are turned to info tthe progress bar output is distorted, 
this should be fixed to allow end user to better read the details, this will 
most probably require merging the query log request / response mechanism to be 
similar to getProgressUpdate

> In Progress update refreshes seem slow
> --
>
> Key: HIVE-15847
> URL: https://issues.apache.org/jira/browse/HIVE-15847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> After HIVE-15473, the refresh rates for in place progress bar seems to be 
> slow on hive cli. 
> As pointed out by [~prasanth_j] 
> {quote}
> The refresh rate is slow. Following video will show it
> before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
> after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15847) In Progress update refreshes seem slow

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-15847:
--

Assignee: anishek

> In Progress update refreshes seem slow
> --
>
> Key: HIVE-15847
> URL: https://issues.apache.org/jira/browse/HIVE-15847
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> After HIVE-15473, the refresh rates for in place progress bar seems to be 
> slow on hive cli. 
> As pointed out by [~prasanth_j] 
> {quote}
> The refresh rate is slow. Following video will show it
> before patch: https://asciinema.org/a/2fgcncxg5gjavcpxt6lfb8jg9
> after patch: https://asciinema.org/a/2tht5jf6l9b2dc3ylt5gtztqg
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865168#comment-15865168
 ] 

Roshan Naik commented on HIVE-15691:


ok,. Flume is not yet switched over to 2.x Hive.


> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865164#comment-15865164
 ] 

Hive QA commented on HIVE-15898:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852490/HIVE-15898.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10225 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd]
 (batchId=139)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=128)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3529/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3529/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3529/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852490 - PreCommit-HIVE-Build

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.6.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865137#comment-15865137
 ] 

Chao Sun edited comment on HIVE-15489 at 2/14/17 6:10 AM:
--

Test failures seem unrelated. Attaching the same patch v6 to confirm.


was (Author: csun):
Test failures seem unrelated. Attaching the same patch v4 to confirm.

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.4.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.6.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.6.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.6.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.6.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.4.patch

Test failures seem unrelated. Attaching the same patch v4 to confirm.

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.6.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.6.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.6.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.5.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.6.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.4.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.5.patch, HIVE-15489.6.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15796) HoS: poor reducer parallelism when operator stats are not accurate

2017-02-13 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15796:

Attachment: HIVE-15796.3.patch

Attaching new patch with default set to true.

> HoS: poor reducer parallelism when operator stats are not accurate
> --
>
> Key: HIVE-15796
> URL: https://issues.apache.org/jira/browse/HIVE-15796
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15796.1.patch, HIVE-15796.2.patch, 
> HIVE-15796.3.patch, HIVE-15796.wip.1.patch, HIVE-15796.wip.2.patch, 
> HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865114#comment-15865114
 ] 

Matt McCline commented on HIVE-14016:
-

vector_empty_where.q and vectorization_15.q are related.
Unclear what happened with TestSparkCliDriver

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15892) Vectorization: Fast Hash tables need to do bounds checking during expand

2017-02-13 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15892:

Status: Patch Available  (was: Open)

> Vectorization: Fast Hash tables need to do bounds checking during expand
> 
>
> Key: HIVE-15892
> URL: https://issues.apache.org/jira/browse/HIVE-15892
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15892.01.patch
>
>
> VectorMapJoinFastLongHashTable line 165 gets NegativeArraySizeException:
> {code}
> long[] newSlotPairs = new long[newSlotPairArraySize];
> {code}
> We need to add a size check... Java math for this wrapped around to negative:
> {code}
> int newSlotPairArraySize = newLogicalHashBucketCount * 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15892) Vectorization: Fast Hash tables need to do bounds checking during expand

2017-02-13 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15892:

Attachment: HIVE-15892.01.patch

> Vectorization: Fast Hash tables need to do bounds checking during expand
> 
>
> Key: HIVE-15892
> URL: https://issues.apache.org/jira/browse/HIVE-15892
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15892.01.patch
>
>
> VectorMapJoinFastLongHashTable line 165 gets NegativeArraySizeException:
> {code}
> long[] newSlotPairs = new long[newSlotPairArraySize];
> {code}
> We need to add a size check... Java math for this wrapped around to negative:
> {code}
> int newSlotPairArraySize = newLogicalHashBucketCount * 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15892) Vectorization: Fast Hash tables need to do bounds checking during expand

2017-02-13 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865101#comment-15865101
 ] 

Matt McCline edited comment on HIVE-15892 at 2/14/17 5:40 AM:
--

New error message needs work -- it doesn't prescribe a solution.

Add test?

Also add code for BytesBytesMultiHashMap, too?

Files of interest: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTable.java
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastBytesHashTable.java
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastLongHashTable.java


was (Author: mmccline):
New error message needs work -- it doesn't prescribe a solution.

Add test?

Also add code for BytesBytesMultiHashMap, too?

> Vectorization: Fast Hash tables need to do bounds checking during expand
> 
>
> Key: HIVE-15892
> URL: https://issues.apache.org/jira/browse/HIVE-15892
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15892.01.patch
>
>
> VectorMapJoinFastLongHashTable line 165 gets NegativeArraySizeException:
> {code}
> long[] newSlotPairs = new long[newSlotPairArraySize];
> {code}
> We need to add a size check... Java math for this wrapped around to negative:
> {code}
> int newSlotPairArraySize = newLogicalHashBucketCount * 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865108#comment-15865108
 ] 

Thejas M Nair commented on HIVE-15889:
--

Why are we having to use reflection here ? 
(I could be missing something that should be obvious, looks like was introduced 
long ago in HIVE-8393)


> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.2.0
>
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865106#comment-15865106
 ] 

Hive QA commented on HIVE-14016:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852487/HIVE-14016.06.patch

{color:green}SUCCESS:{color} +1 due to 22 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_empty_where] 
(batchId=22)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=105)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] 
(batchId=122)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3528/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3528/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3528/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852487 - PreCommit-HIVE-Build

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15906:
---
Attachment: HIVE-15906.2.patch

> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15906.1.patch, HIVE-15906.2.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15892) Vectorization: Fast Hash tables need to do bounds checking during expand

2017-02-13 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865101#comment-15865101
 ] 

Matt McCline commented on HIVE-15892:
-

New error message needs work -- it doesn't prescribe a solution.

Add test?

Also add code for BytesBytesMultiHashMap, too?

> Vectorization: Fast Hash tables need to do bounds checking during expand
> 
>
> Key: HIVE-15892
> URL: https://issues.apache.org/jira/browse/HIVE-15892
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> VectorMapJoinFastLongHashTable line 165 gets NegativeArraySizeException:
> {code}
> long[] newSlotPairs = new long[newSlotPairArraySize];
> {code}
> We need to add a size check... Java math for this wrapped around to negative:
> {code}
> int newSlotPairArraySize = newLogicalHashBucketCount * 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865093#comment-15865093
 ] 

Thejas M Nair commented on HIVE-15906:
--

+1 
Pending tests


> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15906.1.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15906:
---
Fix Version/s: 2.2.0
Affects Version/s: 2.2.0
   Status: Patch Available  (was: Open)

> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-15906.1.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865087#comment-15865087
 ] 

ASF GitHub Bot commented on HIVE-15906:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/146

HIVE-15906 : hrift code regeneration to include new protocol version



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-15906

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 80c1b962e782dc52ad8e71741571acb8d9944268
Author: Anishek Agarwal 
Date:   2017-02-14T05:22:15Z

HIVE-15906 : hrift code regeneration to include new protocol version




> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Attachments: HIVE-15906.1.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15906:
---
Attachment: HIVE-15906.1.patch

> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
> Attachments: HIVE-15906.1.patch
>
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15906) thrift code regeneration to include new protocol version

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15906:
---
Summary: thrift code regeneration to include new protocol version  (was: 
thrift code regeneration after change in HIVE-15473)

> thrift code regeneration to include new protocol version
> 
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865077#comment-15865077
 ] 

Eugene Koifman commented on HIVE-15691:
---

Actually, it's the version of c'tor that takes a StreamingConnection parameter 
is the new/better version.
Note that in other Writers the c'tors w/o connection parameter are all 
deprecated

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15906) thrift code regeneration after change in HIVE-15473

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-15906:
---
Priority: Critical  (was: Blocker)

> thrift code regeneration after change in HIVE-15473
> ---
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Critical
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-02-13 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Attachment: HIVE-15898.02.patch

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15906) thrift code regeneration after change in HIVE-15473

2017-02-13 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-15906:
--


> thrift code regeneration after change in HIVE-15473
> ---
>
> Key: HIVE-15906
> URL: https://issues.apache.org/jira/browse/HIVE-15906
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Blocker
>
> HIVE-15473  changed the protocol version in thrift file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15900) Beeline prints tez job progress in stdout instead of stderr

2017-02-13 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15900:
-
Summary: Beeline prints tez job progress in stdout instead of stderr  (was: 
Tez job progress included in stdout instead of stderr)

> Beeline prints tez job progress in stdout instead of stderr
> ---
>
> Key: HIVE-15900
> URL: https://issues.apache.org/jira/browse/HIVE-15900
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0
>Reporter: Aswathy Chellammal Sreekumar
> Attachments: std_out
>
>
> Tez job progress messages are getting updated to stdout instead of stderr
> Attaching output file for below command, with the tez job status printed
> $HIVE_HOME/bin/beeline -n  -p  -u " --outputformat=tsv -e "analyze table studenttab10k compute statistics;" > 
> stdout



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Attachment: HIVE-14016.06.patch

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Status: Patch Available  (was: In Progress)

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch, 
> HIVE-14016.06.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865031#comment-15865031
 ] 

Matt McCline commented on HIVE-14016:
-

Got rid of grouping sets EXPLAIN display.

Query result differences for vectorization_15.q.out and 
vectorization_div0.q.out ???

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14016) Vectorization: Add support for Grouping Sets

2017-02-13 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14016:

Status: In Progress  (was: Patch Available)

> Vectorization: Add support for Grouping Sets
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14016.01.patch, HIVE-14016.02.patch, 
> HIVE-14016.03.patch, HIVE-14016.04.patch, HIVE-14016.05.patch
>
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15905) Inefficient plan for correlated subqueries

2017-02-13 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15905:
---
Attachment: HIVE-15905.1.patch

> Inefficient plan for correlated subqueries
> --
>
> Key: HIVE-15905
> URL: https://issues.apache.org/jira/browse/HIVE-15905
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15905.1.patch
>
>
> Currently Calcite produces an un-necessary join to generate correlated values 
> for inner query. More details are at CALCITE-1494.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15905) Inefficient plan for correlated subqueries

2017-02-13 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-15905:
--


> Inefficient plan for correlated subqueries
> --
>
> Key: HIVE-15905
> URL: https://issues.apache.org/jira/browse/HIVE-15905
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Currently Calcite produces an un-necessary join to generate correlated values 
> for inner query. More details are at CALCITE-1494.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2017-02-13 Thread Greg Senia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864981#comment-15864981
 ] 

Greg Senia commented on HIVE-10511:
---

[~gopalv] thank you for further insight. I wish all Hadoop Vendors I talked 
with felt the same way. I know multiple vendors who feel Hive and HiveServer2 
should be the ONLY access mechanism on top of Hadoop. I guess the approach 
we've been taking at my current and past employer is that tools like Voltage or 
Protegrity or Dataguise would be used to secure column level access using FPE. 
But I can see how some companies would not want to invest down that road. How 
is LLAP and doAS working is that going to work as things like Protegrity 
require jobs to run as the actual endUser. It cannot run as Hive and 
unfortunately I don't think this requirement will be changing any time soon.

> Replacing the implementation of Hive CLI using Beeline
> --
>
> Key: HIVE-10511
> URL: https://issues.apache.org/jira/browse/HIVE-10511
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Hive CLI is a legacy tool which had two main use cases: 
> 1. a thick client for SQL on hadoop
> 2. a command line tool for HiveServer1.
> HiveServer1 is already deprecated and removed from Hive code base, so  use 
> case #2 is out of the question. For #1, Beeline provides or is supposed to 
> provides equal functionality, yet is implemented differently from Hive CLI.
> As it has been a while that Hive community has been recommending Beeline + 
> HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
> use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
> Beeline plus embedded HS2 so that Hive community only needs to maintain a 
> single code path. In this way, Hive CLI is just an alias to Beeline at either 
> shell script level or at high code level. The goal is that  no changes or 
> minimum changes are expected from existing user scrip using Hive CLI.
> This is an Umbrella JIRA covering all tasks related to this initiative. Over 
> the last year or two, Beeline has been improved significantly to match what 
> Hive CLI offers. Still, there may still be some gaps or deficiency to be 
> discovered and fixed. In the meantime, we also want to make sure the enough 
> tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864979#comment-15864979
 ] 

Hive QA commented on HIVE-15898:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852459/HIVE-15898.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10222 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=162)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3527/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3527/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3527/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852459 - PreCommit-HIVE-Build

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15889:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~sershe]. Committed to master.

> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.2.0
>
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-13 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-15904:

Description: 
Following query failing with Null Pointer Exception from 
org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

Attaching create table statements for table_1 and table_18

Query:
SELECT
COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY (t2.int_col_10 
+ t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50), 
FLOOR(t1.double_col_16) DESC), 524) AS int_col,
(t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
FLOOR(t1.double_col_16) AS float_col,
COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
FOLLOWING), 704) AS int_col_2
FROM table_1 t1
INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
((t2.tinyint_col_20) = (t1.tinyint_col_3))
WHERE (t2.smallint_col_19) IN (SELECT
COALESCE(-92, -994) AS int_col
FROM table_1 tt1
INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));

Error Stack:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: NullPointerException null
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
 
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:193)
 
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
 
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:507)
 
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495)
 
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:308)
 
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
 
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:599)
 
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_112]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:402)
 
at 
org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:226)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:358)
 
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:90)
 
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:134) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11126)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:288)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:257)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:447) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189) 
at

[jira] [Updated] (HIVE-15799) LLAP: rename VertorDeserializeOrcWriter

2017-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15799:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master


> LLAP: rename VertorDeserializeOrcWriter
> ---
>
> Key: HIVE-15799
> URL: https://issues.apache.org/jira/browse/HIVE-15799
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15799.patch
>
>
> As convenient as it is to grep for, based on continuous RB comments I am not 
> sure the world is yet ready for vertorized execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-13 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-15904:

Attachment: table_1.q
table_18.q

> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
> Attachments: table_18.q, table_1.q
>
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15878:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15878.patch
>
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15904) select query throwing Null Pointer Exception from org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan

2017-02-13 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar reassigned HIVE-15904:
---


> select query throwing Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> --
>
> Key: HIVE-15904
> URL: https://issues.apache.org/jira/browse/HIVE-15904
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
>
> Following query failing with Null Pointer Exception from 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan
> Attaching create table statements for table_1 and table_18
> Query:
> SELECT
> COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY 
> (t2.int_col_10 + t1.smallint_col_50) ORDER BY (t2.int_col_10 + 
> t1.smallint_col_50), FLOOR(t1.double_col_16) DESC), 524) AS int_col,
> (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
> FLOOR(t1.double_col_16) AS float_col,
> COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + 
> t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, 
> FLOOR(t1.double_col_16) DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 48 
> FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
> ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
> ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
> COALESCE(-92, -994) AS int_col
> FROM table_1 tt1
> INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
> WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15896) LLAP: improved failures when security is set up incorrectly

2017-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15896:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP: improved failures when security is set up incorrectly
> ---
>
> Key: HIVE-15896
> URL: https://issues.apache.org/jira/browse/HIVE-15896
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15896.patch
>
>
> Right now LLAP may fail in ZK ACL check when the ACLs are invalid. We can try 
> to fail earlier, and also improve the message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Kalyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864938#comment-15864938
 ] 

Kalyan commented on HIVE-15691:
---

Thanks for your support, [~roshan_naik], [~ekoifman]

To work with flume, regex support we are adding `StrictRegexWriter`.

If this patch only for latest version 2.x, i need only 2 constructors

StrictRegexWriter(String regex, HiveEndPoint endPoint)
StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

may be any committing on old version need other 2 constructors.

As per your comments .. i will update the patch..



> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HIVE-15888) constant propagation optimizer failed when query has the same alias with subquery

2017-02-13 Thread Walter Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu resolved HIVE-15888.
--
Resolution: Duplicate

> constant propagation optimizer failed when query has the same alias with 
> subquery
> -
>
> Key: HIVE-15888
> URL: https://issues.apache.org/jira/browse/HIVE-15888
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Walter Wu
>
> Example  :
> select * 
> from dpdim_employee_org_d c 
> join 
> (
> select a.* from dpmid_md_organization a
> left outer join dpmid_md_organization b 
> on a.organizationid = b.superiororganizationid and b.hisdate = '2016-10-05'
> where a.hisdate = '2016-09-05'
> and b.organizationid is null 
> ) b 
> on c.org_id = b.organizationid 
> and c.hp_cal_dt = '2016-09-05' limit 10;
> Description:
> when ppd optimize is enabled this query has empty result . If we unenabled 
> constant propagation optimize or we replace the subquery alias 'b' with 'b1' 
> , this query will work correctly.
> I explain this query and find that after ppd optimize Filter Operator 
> predicate conf changed from 'predicate: superiororganizationid is not null 
> (type: boolean)' to 'predicate: false (type: boolean)'.
> The subquery has a filter predicate conf 'b.organizationid is 
> null','b.organizationid' should equal to 'b:b.organizationid' . The outer 
> query has a filter predicate conf 'b.organizationid is not null', 
> 'b.organizationid' should equal to 'b:a.organizationid'. While rowSchema get 
> Column Info on tabAlias:'b' and alias:'organizationid'. constant propagation 
> optimize combine 'b.organizationid is not null' and  'b.organizationid is 
> null' to 'constant false' . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15888) constant propagation optimizer failed when query has the same alias with subquery

2017-02-13 Thread Walter Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864935#comment-15864935
 ] 

Walter Wu commented on HIVE-15888:
--

I find it has been fixed in https://issues.apache.org/jira/browse/HIVE-13602

> constant propagation optimizer failed when query has the same alias with 
> subquery
> -
>
> Key: HIVE-15888
> URL: https://issues.apache.org/jira/browse/HIVE-15888
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Walter Wu
>
> Example  :
> select * 
> from dpdim_employee_org_d c 
> join 
> (
> select a.* from dpmid_md_organization a
> left outer join dpmid_md_organization b 
> on a.organizationid = b.superiororganizationid and b.hisdate = '2016-10-05'
> where a.hisdate = '2016-09-05'
> and b.organizationid is null 
> ) b 
> on c.org_id = b.organizationid 
> and c.hp_cal_dt = '2016-09-05' limit 10;
> Description:
> when ppd optimize is enabled this query has empty result . If we unenabled 
> constant propagation optimize or we replace the subquery alias 'b' with 'b1' 
> , this query will work correctly.
> I explain this query and find that after ppd optimize Filter Operator 
> predicate conf changed from 'predicate: superiororganizationid is not null 
> (type: boolean)' to 'predicate: false (type: boolean)'.
> The subquery has a filter predicate conf 'b.organizationid is 
> null','b.organizationid' should equal to 'b:b.organizationid' . The outer 
> query has a filter predicate conf 'b.organizationid is not null', 
> 'b.organizationid' should equal to 'b:a.organizationid'. While rowSchema get 
> Column Info on tabAlias:'b' and alias:'organizationid'. constant propagation 
> optimize combine 'b.organizationid is not null' and  'b.organizationid is 
> null' to 'constant false' . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864930#comment-15864930
 ] 

Hive QA commented on HIVE-15160:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852448/HIVE-15160.06.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 211 failed/errored test(s), 10239 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=219)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_qualified_types] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer13] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cteViews] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_join2] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_udf] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[gen_udf_example_add10] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_grouping]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_vc] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[limit_pushdown2] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby3] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_non_dictionary_encoding_vectorization]
 (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_vectorization]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup2] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3] (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_udf_case] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[regex_col] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_case_column_pruning] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_json_tuple] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_parse_url_tuple] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_types] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_date_1] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_1] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_if_expr] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce2] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_string_concat] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf3] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_mapjoin1] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_13] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_14] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_15] 
(batchId=59)

[jira] [Assigned] (HIVE-15888) constant propagation optimizer failed when query has the same alias with subquery

2017-02-13 Thread Walter Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu reassigned HIVE-15888:


Assignee: (was: Walter Wu)

> constant propagation optimizer failed when query has the same alias with 
> subquery
> -
>
> Key: HIVE-15888
> URL: https://issues.apache.org/jira/browse/HIVE-15888
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Walter Wu
>
> Example  :
> select * 
> from dpdim_employee_org_d c 
> join 
> (
> select a.* from dpmid_md_organization a
> left outer join dpmid_md_organization b 
> on a.organizationid = b.superiororganizationid and b.hisdate = '2016-10-05'
> where a.hisdate = '2016-09-05'
> and b.organizationid is null 
> ) b 
> on c.org_id = b.organizationid 
> and c.hp_cal_dt = '2016-09-05' limit 10;
> Description:
> when ppd optimize is enabled this query has empty result . If we unenabled 
> constant propagation optimize or we replace the subquery alias 'b' with 'b1' 
> , this query will work correctly.
> I explain this query and find that after ppd optimize Filter Operator 
> predicate conf changed from 'predicate: superiororganizationid is not null 
> (type: boolean)' to 'predicate: false (type: boolean)'.
> The subquery has a filter predicate conf 'b.organizationid is 
> null','b.organizationid' should equal to 'b:b.organizationid' . The outer 
> query has a filter predicate conf 'b.organizationid is not null', 
> 'b.organizationid' should equal to 'b:a.organizationid'. While rowSchema get 
> Column Info on tabAlias:'b' and alias:'organizationid'. constant propagation 
> optimize combine 'b.organizationid is not null' and  'b.organizationid is 
> null' to 'constant false' . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15888) constant propagation optimizer failed when query has the same alias with subquery

2017-02-13 Thread Walter Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Wu reassigned HIVE-15888:


Assignee: Walter Wu

> constant propagation optimizer failed when query has the same alias with 
> subquery
> -
>
> Key: HIVE-15888
> URL: https://issues.apache.org/jira/browse/HIVE-15888
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Walter Wu
>Assignee: Walter Wu
>
> Example  :
> select * 
> from dpdim_employee_org_d c 
> join 
> (
> select a.* from dpmid_md_organization a
> left outer join dpmid_md_organization b 
> on a.organizationid = b.superiororganizationid and b.hisdate = '2016-10-05'
> where a.hisdate = '2016-09-05'
> and b.organizationid is null 
> ) b 
> on c.org_id = b.organizationid 
> and c.hp_cal_dt = '2016-09-05' limit 10;
> Description:
> when ppd optimize is enabled this query has empty result . If we unenabled 
> constant propagation optimize or we replace the subquery alias 'b' with 'b1' 
> , this query will work correctly.
> I explain this query and find that after ppd optimize Filter Operator 
> predicate conf changed from 'predicate: superiororganizationid is not null 
> (type: boolean)' to 'predicate: false (type: boolean)'.
> The subquery has a filter predicate conf 'b.organizationid is 
> null','b.organizationid' should equal to 'b:b.organizationid' . The outer 
> query has a filter predicate conf 'b.organizationid is not null', 
> 'b.organizationid' should equal to 'b:a.organizationid'. While rowSchema get 
> Column Info on tabAlias:'b' and alias:'organizationid'. constant propagation 
> optimize combine 'b.organizationid is not null' and  'b.organizationid is 
> null' to 'constant false' . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864924#comment-15864924
 ] 

Sergey Shelukhin commented on HIVE-15889:
-

+1

> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15903) Compute table stats when user computes column stats

2017-02-13 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-15903:
--


> Compute table stats when user computes column stats
> ---
>
> Key: HIVE-15903
> URL: https://issues.apache.org/jira/browse/HIVE-15903
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15887) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-13 Thread KaiXu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864894#comment-15864894
 ] 

KaiXu commented on HIVE-15887:
--

this issue occurred again yesterday, to be notable, this issue occurred when 
dynamic allocation is as default(disabled).

> could not get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> 
>
> Key: HIVE-15887
> URL: https://issues.apache.org/jira/browse/HIVE-15887
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: Hive2.2
> Spark2.0.2
> hadoop2.7.1
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> From Hive's log, failed to get APP ID, so this might be the cause why the 
> driver did not start up.
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
>

[jira] [Commented] (HIVE-15893) Followup on HIVE-15671

2017-02-13 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864892#comment-15864892
 ] 

Rui Li commented on HIVE-15893:
---

My hunch is HIVE-15860 doesn't help here. It's only intended for RSC crashing 
between JobStarted and JobSubmitted. While the problem here is to detect broken 
connection during query execution right?

> Followup on HIVE-15671
> --
>
> Key: HIVE-15893
> URL: https://issues.apache.org/jira/browse/HIVE-15893
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> In HIVE-15671, we fixed a type where server.connect.timeout is used in the 
> place of client.connect.timeout. This might solve some potential problems, 
> but the original problem reported in HIVE-15671 might still exist. (Not sure 
> if HIVE-15860 helps). Here is the proposal suggested by Marcelo:
> {quote}
> bq: server detecting a driver problem after it has connected back to the 
> server.
> Hmm. That is definitely not any of the "connect" timeouts, which probably 
> means it isn't configured and is just using netty's default (which is 
> probably no timeout?). Would probably need something using 
> io.netty.handler.timeout.IdleStateHandler, and also some periodic "ping" so 
> that the connection isn't torn down without reason.
> {quote}
> We will use this JIRA to track the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15902) Select query involving date throwing Hive 2 Internal error: unsupported conversion from type: date

2017-02-13 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar reassigned HIVE-15902:
---


> Select query involving date throwing Hive 2 Internal error: unsupported 
> conversion from type: date
> --
>
> Key: HIVE-15902
> URL: https://issues.apache.org/jira/browse/HIVE-15902
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0
>Reporter: Aswathy Chellammal Sreekumar
>Assignee: Jason Dere
>
> The following query is throwing Hive 2 Internal error: unsupported conversion 
> from type: date
> Query:
> create table table_one (ts timestamp, dt date) stored as orc;
> insert into table_one values ('2034-08-04 17:42:59','2038-07-01');
> insert into table_one values ('2031-02-07 13:02:38','2072-10-19');
> create table table_two (ts timestamp, dt date) stored as orc;
> insert into table_two values ('2069-04-01 09:05:54','1990-10-12');
> insert into table_two values ('2031-02-07 13:02:38','2072-10-19');
> create table table_three as
> select count(*) from table_one
> group by ts,dt
> having dt in (select dt from table_two);
> Error while running task ( failure ) : 
> attempt_1486991777989_0184_18_02_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:420)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:883)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
>   ... 18 more
> Caused by: java.lang.RuntimeException: Hive 2 Internal error: unsupported 
> conversion from type: date
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getLong(PrimitiveObjectInspectorUtils.java:770)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColumnBetweenDynamicValue.evaluate(FilterLongColumnBetweenDynamicValue.java:82)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:39)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:112)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:883)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:783)
>   ... 19 more



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-15872) The PERCENTILE_APPROX UDAF does not work with empty set

2017-02-13 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864885#comment-15864885
 ] 

Chaozhong Yang commented on HIVE-15872:
---

Thanks [~wzheng] for your review!

> The PERCENTILE_APPROX UDAF does not work with empty set
> ---
>
> Key: HIVE-15872
> URL: https://issues.apache.org/jira/browse/HIVE-15872
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 2.2.0, 2.1.2
>
> Attachments: HIVE-15872.2.patch, HIVE-15872.3.patch, 
> HIVE-15872.4.patch, HIVE-15872.patch
>
>
> 1. Original SQL:
> {code}
> select
> percentile_approx(
> column0,
> array(0.50, 0.70, 0.90, 0.95, 0.99)
> )
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> 2. Exception StackTrace:
> {code}
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) 
> ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
>  at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) 
> ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 
> at java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
> java.util.ArrayList.get(ArrayList.java:429) at 
> org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
>  ... 8 more
> {code}
> 3. review data:
> {code}
> select
> column0
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> After run this sql, we found the result is NULL.
> 4. what's the meaning of [0.0, 1.0] in stacktrace?
> In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an 
> ArrayList which name is partialHistogram. Normally, the basic structure of 
> partialHistogram is [npercentiles, percentile0, percentile1..., nbins, 
> bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set) 
> column values, the partialHistoram will only contains [npercentiles(0), 
> nbins(1)]. That's the reason why the stacktrace shows a strange row data: 
> {"key":{},"value":{"_col0":[0.0,1.0]}}
> Before we call histogram#merge (on-line hisgoram algorithm from paper: 
> http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the 
> partialHistogram should remove elements which store percentiles like 
> `partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty 
> set, GenericUDAFPercentileApproxEvaluator will not remove percentiles. 
> Consequently, NumericHistogram will merge a list which contains only 2 
> elements([0, 1.0]) and throws IndexOutOfBoundsException. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15901) LLAP: incorrect usage of gap cache

2017-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15901:

Attachment: HIVE-15901.patch

[~gopalv] can you take a look? tiny patch. the reset to null shouldn't affect 
anything, done just in case; the bug is where the caches are called 
sequentially.

> LLAP: incorrect usage of gap cache
> --
>
> Key: HIVE-15901
> URL: https://issues.apache.org/jira/browse/HIVE-15901
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15901.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15887) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-13 Thread KaiXu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864877#comment-15864877
 ] 

KaiXu commented on HIVE-15887:
--

>From nodemanager log, can only see container transitioned from LOCALIZED to 
>RUNNING, then failed with exitCode=10.

2017-02-13 05:04:00,536 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip(->/mnt/disk6/yarn/nm/usercache/root/filecache/94/__spark_libs__6842484649003444330.zip)
 transitioned from DOWNLOADING to LOCALIZED
2017-02-13 05:04:00,641 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip(->/mnt/disk7/yarn/nm/usercache/root/filecache/95/__spark_conf__.zip)
 transitioned from DOWNLOADING to LOCALIZED
2017-02-13 05:04:00,641 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1486905599813_0046_01_01 transitioned from LOCALIZING 
to LOCALIZED
2017-02-13 05:04:00,661 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1486905599813_0046_01_01 transitioned from LOCALIZED 
to RUNNING
2017-02-13 05:04:00,661 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Neither virutal-memory nor physical-memory monitoring is needed. Not running 
the monitor-thread
2017-02-13 05:04:00,717 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
launchContainer: [bash, 
/mnt/disk2/yarn/nm/usercache/root/appcache/application_1486905599813_0046/container_1486905599813_0046_01_01/default_container_executor.sh]
2017-02-13 05:04:03,304 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1486905599813_0047_01 (auth:SIMPLE)

2017-02-13 05:05:42,694 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1486905599813_0046_01_01 is : 10
2017-02-13 05:05:42,695 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: container_1486905599813_0046_01_01 
and exit code: 10
ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-02-13 05:05:42,699 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2017-02-13 05:05:42,699 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1486905599813_0046_01_01
2017-02-13 05:05:42,699 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 10

> could not get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> 
>
> Key: HIVE-15887
> URL: https://issues.apache.org/jira/browse/HIVE-15887
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: Hive2.2
> Spark2.0.2
> hadoop2.7.1
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> From Hive's log, failed to get APP ID, so this might be the cause why the 
> driver did not start up.
> console log:
> Starting Spark Job =

[jira] [Commented] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864875#comment-15864875
 ] 

Hive QA commented on HIVE-15889:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852443/HIVE-15889.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10238 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3525/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3525/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3525/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852443 - PreCommit-HIVE-Build

> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-02-13 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Status: Patch Available  (was: Open)

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-02-13 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Attachment: HIVE-15898.01.patch

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15901) LLAP: incorrect usage of gap cache

2017-02-13 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15901:
---

Assignee: Sergey Shelukhin

> LLAP: incorrect usage of gap cache
> --
>
> Key: HIVE-15901
> URL: https://issues.apache.org/jira/browse/HIVE-15901
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation

2017-02-13 Thread Dapeng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864843#comment-15864843
 ] 

Dapeng Sun commented on HIVE-15682:
---

Hi, Xuefu,

Okay, I will run these queries, but my cluster is using by another colleague, I 
will reply you when I get the cluster.

> Eliminate per-row based dummy iterator creation
> ---
>
> Key: HIVE-15682
> URL: https://issues.apache.org/jira/browse/HIVE-15682
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.2.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2017-02-13 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864842#comment-15864842
 ] 

Thejas M Nair commented on HIVE-15473:
--

[~anishek]
It seems like the thrift file wasn't regenerated after the version change -
{code}
34134[hive17:07]$  git grep HIVE_CLI_SERVICE_PROTOCOL_V10
service-rpc/if/TCLIService.thrift:  HIVE_CLI_SERVICE_PROTOCOL_V10
service-rpc/if/TCLIService.thrift:  1: required TProtocolVersion 
client_protocol = TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V10
service-rpc/if/TCLIService.thrift:  2: required TProtocolVersion 
serverProtocolVersion = TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V10
134134[hive17:27]$
134134[hive17:27]$
134134[hive17:27]$  git grep HIVE_CLI_SERVICE_PROTOCOL_V9
jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java:
supportedProtocols.add(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9);
service-rpc/if/TCLIService.thrift:  HIVE_CLI_SERVICE_PROTOCOL_V9
service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp:  
TProtocolVersion::HIVE_CLI_SERVICE_PROTOCOL_V9
service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp:  
"HIVE_CLI_SERVICE_PROTOCOL_V9"
service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h:
HIVE_CLI_SERVICE_PROTOCOL_V9 = 8
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TOpenSessionReq.java:
this.client_protocol = 
org.apache.hive.service.rpc.thrift.TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9;
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TOpenSessionReq.java:
this.client_protocol = 
org.apache.hive.service.rpc.thrift.TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9;
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TOpenSessionResp.java:
this.serverProtocolVersion = 
org.apache.hive.service.rpc.thrift.TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9;
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TOpenSessionResp.java:
this.serverProtocolVersion = 
org.apache.hive.service.rpc.thrift.TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9;
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java:
  HIVE_CLI_SERVICE_PROTOCOL_V9(8);
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TProtocolVersion.java:
return HIVE_CLI_SERVICE_PROTOCOL_V9;
service-rpc/src/gen/thrift/gen-php/Types.php:  const 
HIVE_CLI_SERVICE_PROTOCOL_V9 = 8;
service-rpc/src/gen/thrift/gen-php/Types.php:8 => 
'HIVE_CLI_SERVICE_PROTOCOL_V9',
service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py:  
HIVE_CLI_SERVICE_PROTOCOL_V9 = 8
service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py:8: 
"HIVE_CLI_SERVICE_PROTOCOL_V9",
service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py:
"HIVE_CLI_SERVICE_PROTOCOL_V9": 8,
service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb:  
HIVE_CLI_SERVICE_PROTOCOL_V9 = 8
service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb:  VALUE_MAP = {0 => 
"HIVE_CLI_SERVICE_PROTOCOL_V1", 1 => "HIVE_CLI_SERVICE_PROTOCOL_V2", 2 => 
"HIVE_CLI_SERVICE_PROTOCOL_V3", 3 => "HIVE_CLI_SERVICE_PROTOCOL_V4", 4 => 
"HIVE_CLI_SERVICE_PROTOCOL_V5", 5 => "HIVE_CLI_SERVICE_PROTOCOL_V6", 6 => 
"HIVE_CLI_SERVICE_PROTOCOL_V7", 7 => "HIVE_CLI_SERVICE_PROTOCOL_V8", 8 => 
"HIVE_CLI_SERVICE_PROTOCOL_V9"}
service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb:  VALID_VALUES = 
Set.new([HIVE_CLI_SERVICE_PROTOCOL_V1, HIVE_CLI_SERVICE_PROTOCOL_V2, 
HIVE_CLI_SERVICE_PROTOCOL_V3, HIVE_CLI_SERVICE_PROTOCOL_V4, 
HIVE_CLI_SERVICE_PROTOCOL_V5, HIVE_CLI_SERVICE_PROTOCOL_V6, 
HIVE_CLI_SERVICE_PROTOCOL_V7, HIVE_CLI_SERVICE_PROTOCOL_V8, 
HIVE_CLI_SERVICE_PROTOCOL_V9]).freeze
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:
sm.openSession(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9, "user", 
"passw", "127.0.0.1",
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java:

[jira] [Commented] (HIVE-15887) could not get APP ID and cause failed to connect to spark driver on yarn-client mode

2017-02-13 Thread KaiXu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864837#comment-15864837
 ] 

KaiXu commented on HIVE-15887:
--

Hi [~lirui], above container log is the yarn application log, the middle has 
been cut off for they're repeated "Failed to connect to driver at 
192.168.1.1:43656, retrying".

> could not get APP ID and cause failed to connect to spark driver on 
> yarn-client mode
> 
>
> Key: HIVE-15887
> URL: https://issues.apache.org/jira/browse/HIVE-15887
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 2.2.0
> Environment: Hive2.2
> Spark2.0.2
> hadoop2.7.1
>Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> From Hive's log, failed to get APP ID, so this might be the cause why the 
> driver did not start up.
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_02
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864832#comment-15864832
 ] 

Roshan Naik commented on HIVE-15691:


I see.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864813#comment-15864813
 ] 

Hive QA commented on HIVE-15891:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852440/HIVE-15891.1.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 41 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.ql.TestTxnCommands.testDeleteIn (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeDeleteUpdate (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeType2SCD01 (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeType2SCD02 (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeUpdateDelete (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier2 (batchId=270)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge 
(batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testDynamicPartitionsMerge2 
(batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge2 (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge3 (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2.testMergeWithPredicate (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testDeleteIn 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testDynamicPartitionsMerge
 (batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testDynamicPartitionsMerge2
 (batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge2 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge3 
(batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMergeWithPredicate
 (batchId=268)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDeleteIn
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDynamicPartitionsMerge2
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge2
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge3
 (batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMergeWithPredicate
 (batchId=266)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLocksInSubquery 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMerge3Way01 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMerge3Way02 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMergePartitioned01 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMergePartitioned02 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMergeUnpartitioned01 
(batchId=269)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testMergeUnpartitioned02 
(batchId=269)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3524/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3524/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3524/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 41 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852440 - PreCommit-HIVE-Build

> Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast
> ---
>
> Key: HIVE-15891
> URL:

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864804#comment-15864804
 ] 

Eugene Koifman commented on HIVE-15691:
---

Actually, StreamingConnection is required.  W/o that, in a Kerberos environment 
Connection and Writer end up with a different UserGroupInformation which cause 
issues.  See HIVE-14114 

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-02-13 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864802#comment-15864802
 ] 

Pengcheng Xiong commented on HIVE-15160:


[~vgarg], could u try the patch again? All the queries that you mentioned 
should pass now. Thanks.

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-15889:
-

Assignee: Rajesh Balamohan

> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-13 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-13 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-02-13 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Attachment: HIVE-15160.06.patch

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761
 ] 

Roshan Naik edited comment on HIVE-15691 at 2/14/17 12:39 AM:
--

[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.


Looks fine other than that.


was (Author: roshan_naik):
[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.



> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761
 ] 

Roshan Naik commented on HIVE-15691:


[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.



> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15872) The PERCENTILE_APPROX UDAF does not work with empty set

2017-02-13 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15872:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master and branch-2.1.

Thanks [~debugger87] for the contribution!

> The PERCENTILE_APPROX UDAF does not work with empty set
> ---
>
> Key: HIVE-15872
> URL: https://issues.apache.org/jira/browse/HIVE-15872
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 2.2.0, 2.1.2
>
> Attachments: HIVE-15872.2.patch, HIVE-15872.3.patch, 
> HIVE-15872.4.patch, HIVE-15872.patch
>
>
> 1. Original SQL:
> {code}
> select
> percentile_approx(
> column0,
> array(0.50, 0.70, 0.90, 0.95, 0.99)
> )
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> 2. Exception StackTrace:
> {code}
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) 
> ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
>  at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) 
> ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 
> at java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
> java.util.ArrayList.get(ArrayList.java:429) at 
> org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
>  ... 8 more
> {code}
> 3. review data:
> {code}
> select
> column0
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> After run this sql, we found the result is NULL.
> 4. what's the meaning of [0.0, 1.0] in stacktrace?
> In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an 
> ArrayList which name is partialHistogram. Normally, the basic structure of 
> partialHistogram is [npercentiles, percentile0, percentile1..., nbins, 
> bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set) 
> column values, the partialHistoram will only contains [npercentiles(0), 
> nbins(1)]. That's the reason why the stacktrace shows a strange row data: 
> {"key":{},"value":{"_col0":[0.0,1.0]}}
> Before we call histogram#merge (on-line hisgoram algorithm from paper: 
> http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the 
> partialHistogram should remove elements which store percentiles like 
> `partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty 
> set, GenericUDAFPercentileApproxEvaluator will not remove percentiles. 
> Consequently, NumericHistogram will merge a list which contains only 2 
> elements([0, 1.0]) and throws IndexOutOfBoundsException. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException

2017-02-13 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864742#comment-15864742
 ] 

Thomas Poepping commented on HIVE-15852:


[~ashutoshc] Yeah, I think that solution makes the most sense. Even if there 
are some hidden dependencies in other projects, as you say, there is no 
interface contract and so those assumptions should not even be made. 

I will take a look into how tablesampling can be improved. Hopefully the fix is 
not too wide.

> Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException
> -
>
> Key: HIVE-15852
> URL: https://issues.apache.org/jira/browse/HIVE-15852
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>
> Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which 
> doesn't create empty files to represent empty buckets when Hive is on Tez, a 
> couple things are broken.
> First of all, if there are empty buckets (which is possible with large 
> datasets in the partitioned-bucketed case), tablesampling will not work if 
> you're referencing a bucket number higher than the number of files.
> e.g. In some partition 'p', there are three rows. The table 't' is clustered 
> into ten buckets. With maximal hashing, only three bucket files will be 
> created. If we do select * from t tablesample (bucket x out of 10) where 
>  (where x > 3), an ArrayIndexOutOfBoundsException will be 
> thrown because Hive assumes there are only three buckets.
> Second, other applications (such as Pig) may be making assumptions about the 
> number of files equaling the number of buckets.
> Possible fixes:
> * Revert HIVE-13040
> * Change how tablesampling is implemented to accept possibility that number 
> of files != number of buckets
> ** Would require coordination across projects to change assumptions
> Things to consider:
> * what performance gains are there from not creating empty files?
> * if the gains are large, are we willing to lose them? (by reverting 
> HIVE-13040)
> * _how else can we avoid creating unnecessary files, while still maintaining 
> invariants other applications expect?_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-12631) LLAP: support ORC ACID tables

2017-02-13 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-12631:
-

Assignee: Teddy Choi

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15889) LLAP: Some tasks still run after hive cli is shutdown

2017-02-13 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15889:

Attachment: HIVE-15889.3.patch

Thanks [~sershe]. Attaching .3 patch to address review comments. Used 
setAccessible and also caching it statically.

Reflection is used since there is an option to have tez optionally in earlier 
clusters.  Not sure this is still valid given it is the default engine now. But 
didn't want to change that code for backward compatibility. 

> LLAP: Some tasks still run after hive cli is shutdown
> -
>
> Key: HIVE-15889
> URL: https://issues.apache.org/jira/browse/HIVE-15889
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HIVE-15889.1.patch, HIVE-15889.2.patch, 
> HIVE-15889.3.patch
>
>
> E.g: In cross product case, the tight loop in merge join operator ignores any 
> interrupt or abort flag checks, causing the tasks to be in running state even 
> when the client cli is quit.
> intensionally written cross product query to simulate this.
> {noformat}
> hive> select count(1) from lineitem, orders;;
> {noformat}
> Even when the cli is quit, LLAP would continue executing the task for quite 
> sometime. E.g stack trace
> {noformat}
> "TezTaskRunner" #1945 daemon prio=5 os_prio=0 tid=0x7fe9e43a5000 
> nid=0x4c8 runnable [0x7fc8d881b000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:453)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.mergeJoinComputeKeys(CommonMergeJoinOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:410)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:491)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:209)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:351)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:282)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15852) Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException

2017-02-13 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864733#comment-15864733
 ] 

Ashutosh Chauhan commented on HIVE-15852:
-

Gains of not creating empty files are huge even on HDFS, I don't think we want 
to loose those.

We should fix tablesampling. I am not aware of Pig's assumption for bucketing. 
Can you describe bit more what is that assumption? Anyway, even if it has that 
seems like assuming something in implementation, since there is no interface 
contract anywhere that # of buckets = # of files.

> Tablesampling on Tez in low-record case throws ArrayIndexOutOfBoundsException
> -
>
> Key: HIVE-15852
> URL: https://issues.apache.org/jira/browse/HIVE-15852
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>
> Due to HIVE-13040 ( https://issues.apache.org/jira/browse/HIVE-13040 ), which 
> doesn't create empty files to represent empty buckets when Hive is on Tez, a 
> couple things are broken.
> First of all, if there are empty buckets (which is possible with large 
> datasets in the partitioned-bucketed case), tablesampling will not work if 
> you're referencing a bucket number higher than the number of files.
> e.g. In some partition 'p', there are three rows. The table 't' is clustered 
> into ten buckets. With maximal hashing, only three bucket files will be 
> created. If we do select * from t tablesample (bucket x out of 10) where 
>  (where x > 3), an ArrayIndexOutOfBoundsException will be 
> thrown because Hive assumes there are only three buckets.
> Second, other applications (such as Pig) may be making assumptions about the 
> number of files equaling the number of buckets.
> Possible fixes:
> * Revert HIVE-13040
> * Change how tablesampling is implemented to accept possibility that number 
> of files != number of buckets
> ** Would require coordination across projects to change assumptions
> Things to consider:
> * what performance gains are there from not creating empty files?
> * if the gains are large, are we willing to lose them? (by reverting 
> HIVE-13040)
> * _how else can we avoid creating unnecessary files, while still maintaining 
> invariants other applications expect?_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-13 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864718#comment-15864718
 ] 

Thomas Poepping commented on HIVE-15881:


[~spena] Good point. To take into account everyone's ideas, maybe something 
like {{hive.exec.input.listing.max.threads}}? It's tough, but I'm not sure 
there will be one obvious best answer, I think each one will have pros and cons 
compared to any other. 

Agree with [~yalovyyi]'s suggestion as well. What would we want default to be? 
Looking through {{HiveConf}}, the different {{*.max.threads}} values are:
* {{METASTORESERVERMAXTHREADS("hive.metastore.server.max.threads", 1000,}}
* {{HIVE_SERVER2_WEBUI_MAX_THREADS("hive.server2.webui.max.threads", 50, "The 
max HiveServer2 WebUI threads"),}}
* 
{{LLAP_DAEMON_AM_REPORTER_MAX_THREADS("hive.llap.daemon.am-reporter.max.threads",
 4,}}

So.. all over the place. How would we decide on a suitable default? Or maybe, 
the default is "as many as possible" and it can be lowered by users themselves?

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15895) Use HDFS for stats collection temp dir on blob storage

2017-02-13 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864703#comment-15864703
 ] 

Thomas Poepping commented on HIVE-15895:


Thanks [~spena], but I specifically was talking about another use of 
{{getExtTmpPathRelTo()}} in {{SemanticAnalyzer}}, 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L2207

Happy to have that be addressed elsewhere though, as it could be considered out 
of scope for this JIRA. 

+1 from me as well

> Use HDFS for stats collection temp dir on blob storage
> --
>
> Key: HIVE-15895
> URL: https://issues.apache.org/jira/browse/HIVE-15895
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15895.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15894) Add logical semijoin config in sqlstd safe list

2017-02-13 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864695#comment-15864695
 ] 

Thejas M Nair commented on HIVE-15894:
--

+1 pending tests


> Add logical semijoin config in sqlstd safe list 
> 
>
> Key: HIVE-15894
> URL: https://issues.apache.org/jira/browse/HIVE-15894
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15894.2.patch, HIVE-15894.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 3 >

1 - 100 of 216 matches

Mail list logo